![]() The robot has a rigid plastic shell and is 1.40 m tall. Maggie can communicate through sounds, gestures, and a touch-screen mounted in its chest. The robot was originally designed as a generic research platform to test interaction mechanisms to improve the HRI experience. The first robot, Maggie, is able to move through the environment to interact with people. Apart from the dialog system, the robots include high-quality speakers, microphones, and sound cards. For this reason, selecting the most adequate TTS system is crucial to enhance the user experience. The robots integrate a dialog mechanism to enable natural HRI. For that author, these two features are the most important ones during HRI. For social robots, Alonso defines the naturalness of the generated speech as its degree of similarity with that emitted by a human, while the intelligibility is defined as the ease of the user’s understanding the message generated by the robot. Again, he claims that the evaluations of naturalness and intelligibility are the main evaluation criteria for determining the quality of the speech synthesis. More recently, in 2014, King performs a review of the improvements obtained in the TTS technologies during the last decade. That is, naturalness includes: naturalness, ease of listening, pleasantness, and audio flow on the other hand, intelligibility includes: listening effort, pronunciation, comprehension, articulation, and speaking rate. Each of these concepts, according to that author, includes other features of a TTS system. He uses an extended version of the MOS scale, and he concludes that the most important features to evaluate a TTS system are intelligibility and naturalness. This is the case of the research presented by Viswanathan in. Other studies have been carried out using this MOS scale, or a modified version. ![]() The user evaluates each feature using a score from 1 to 5 (five-point Likert Scale), 5 being the most positive (except for sound quality acceptance, which required a yes/no answer). Therefore, it allows creating very realistic artificial singing voices. This system is different because it is used for talking and singing. The Yamaha Vocaloid Humanoid Robot uses Vocaloid. This feature is particularly important given the emphasis on the robot having its own specific personality. Jibo supports text-to-speech markup this allows selecting which parts of the synthesized text should be given emphasis, or how unusual words or names should be pronounced. The company’s founder, Cynthia Breazeal, describes Jibo ‘as the result of R2D2 and Siri having a baby,’ that is, it is a robot that is endowed with verbal and non-verbal communication skills. The robot Jibo is another kind of social robot that has been recently developed. In this case, the TTS system used is Acapela. Its appearance is similar to a two-year old children (about 1 m tall), and it is used as a research platform to test learning algorithms, cognitive skills, and artificial intelligence algorithms. The experiments also indicated that there was a relationship between the physical appearance of the robots (embodiment) and the suitability of TTS systems.Īnother social robot with verbal communication skills is iCub. Our study shows that participants found differences between the TTS systems evaluated in terms of intelligibility, expressiveness, and artificiality. In this study, four research questions were posed to determine whether it is possible to present a ranking of TTS systems in relation to each evaluated feature, or, on the contrary, there are no significant differences between them. The participants completed a questionnaire to rate each TTS system in relation to four features: intelligibility, expressiveness, artificiality, and suitability. The evaluation was performed after observing videos where a social robot communicates verbally using one TTS system. In order to carry out the study, 125 participants evaluated the performance of the following TTS systems: Google, Microsoft, Ivona, Loquendo, Espeak, Pico, AT&T, and Nuance. ![]() In this paper, we present a comparative study of eight off-the-shelf TTS systems used in social robots. The performance of a speech synthesizer is mainly evaluated by its similarity to the human voice in relation to its intelligibility and expressiveness. In robotics, a Text to Speech (TTS) system is the most common speech synthesizer technique. In order to do this implementation, we must equip social robots with an artificial voice system. This work focuses on the first of them since the majority of social robots implement an interaction system endowed with verbal capacities. Humans possess verbal and non-verbal communication skills, and, therefore, both are essential for social robots to get a natural human–robot interaction. The success of social robotics is directly linked to their ability of interacting with people.
0 Comments
Leave a Reply. |