Вы здесь
Speech synthesis for human-computer interaction
Nowadays human-computer interaction by voice is not widespread, but the situation is changing, and many applications have already been developed to enhance visual interfaces. It is difficult to imagine our life without computers: many people use them in their daily life every minute. The stress on their eyes is extremely high, and it is very important to reduce it by involving other sense organs, such as ears. A computer interface that can communicate with the user by voice is also useful for people who are visually impaired, because it allows them to use a computer almost without using eyes. The possibility to integrate text-to-speech technology in mobile devices makes it possible to use them as hands-free assistants that provide navigation, read e-mails, news and even books while, for instance, you are going to your work by car.
The project is based on the existing Russian text-to-speech system developed by Speech Technology Center Ltd. The main problems solved by the application will be the following:
- Adaptation for visually impaired people.
People who cannot use their eyes for reading text materials and use a text-to-speech system for doing it usually prefer listening to information up to four times faster than normal speech rate. To achieve this purpose, a special speech rate modification algorithm must be developed which provides intelligible speech.
- Porting the system to the Windows Mobile© platform.
More and more mobile devices are used all over the world, so the task of porting a text-to-speech system to a mobile platform is very important. On the one hand, it improves human-computer interaction, and one the other hand, it makes such devices able to read news, books, text messages, e-mails and even the name of the street where you are walking.
In our project the system will be ported under Windows Mobile© platform. To solve this task, computing facilities must be reduced and the database size must be smaller. For this reason, source code optimization will be performed, and a special tool will be developed to estimate the most usable part of speech database and to reduce it for achieving appropriate size.
- Developing the library to support Microsoft Speech API.
Support of Microsoft Speech API makes it possible to include the text-to-speech engine in the Microsoft operation system. This function can read window headings, menu items, and navigate the user from one program to another. It is crucial for people who cannot interact with a computer by using their eyes.
Yuriy Matveev, Dr.Habil.Sc.Ing., Professor, scientific adviser, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics
Andrey Talanov, PhD, technical adviser, Speech Technology Center Ltd
Pavel Chistikov, 1st year postgraduate student, developer, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics
Yuriy Matveev (matveev@mail.ifmo.ru)
Andrey Talanov (andre@speechpro.com)
Pavel Chistikov (chistikov@speechpro.com)
|
Total required time: 24 weeks