Early-bird deadline:

Use FRUCT discount code at booking.com

Find a new job

You are here

Speech synthesis for human-computer interaction

Background and Motivation: 

Nowadays human-computer interaction by voice is not widespread, but the situation is changing, and many applications have already been developed to enhance visual interfaces. It is difficult to imagine our life without computers: many people use them in their daily life every minute. The stress on their eyes is extremely high, and it is very important to reduce it by involving other sense organs, such as ears. A computer interface that can communicate with the user by voice is also useful for people who are visually impaired, because it allows them to use a computer almost without using eyes. The possibility to integrate text-to-speech technology in mobile devices makes it possible to use them as hands-free assistants that provide navigation, read e-mails, news and even books while, for instance, you are going to your work by car.

Project Summary: 

The project is based on the existing Russian text-to-speech system developed by Speech Technology Center Ltd. The main problems solved by the application will be the following:

  1. Adaptation  for  visually impaired people.

People who cannot use their eyes for reading text materials and use a text-to-speech system for doing it usually prefer listening to information up to four times faster than normal speech rate. To achieve this purpose, a special speech rate modification algorithm must be developed which provides intelligible speech.

  1. Porting the system to the Windows Mobile© platform.

More and more mobile devices are used all over the world, so the task of porting a text-to-speech system to a mobile platform is very important. On the one hand, it improves human-computer interaction, and one the other hand, it makes such devices able to read news, books, text messages, e-mails and even the name of the street where you are walking.

In our project the system will be ported under Windows Mobile© platform. To solve this task, computing facilities must be reduced and  the database size must be smaller. For this reason, source code optimization will be performed, and a special tool will be developed to estimate the most usable part of speech database and to reduce it for achieving appropriate size.

  1. Developing the library to support Microsoft Speech API.

Support of Microsoft Speech API makes it possible to include the text-to-speech engine in the Microsoft operation system. This function can read window headings, menu items, and navigate the user from one program to another. It is crucial for people who cannot interact with a computer by using their eyes.

Project goals and future research directions: 
<p>Goals:</p> <ol> <li>Developing a speech rate modification algorithm that keeps a high level of intelligibility while changing speech rate up to four times;</li> <li>Optimizing the text-to-speech application and reducing the database size;</li> <li>Porting the software to Windows Mobile&copy; operation system;</li> <li>Supporting Microsoft speech API interface.</li> </ol> <p>Future research directions:</p> <ol> <li>Development of a data-driven module for intonation modeling instead of the rule-based one used in the current system;</li> </ol> <p>Improving the quality of the synthesized speech by implementing a hybrid approach based on hidden Markov models and Unit selection techniques.</p>
List of team members and their organizations: 

Yuriy Matveev, Dr.Habil.Sc.Ing., Professor, scientific adviser, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics

Andrey Talanov, PhD, technical adviser, Speech Technology Center Ltd
Pavel Chistikov, 1st year postgraduate student, developer, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics

Yuriy Matveev (matveev@mail.ifmo.ru)

Andrey Talanov (andre@speechpro.com)

Pavel Chistikov (chistikov@speechpro.com)

On hold
Project Timeline and Expected Deliverables: 
  • Overview of speech rate modification research and publications (2 weeks)
  • Development of a speech rate modification algorithm for Russian for changing the speech rate up to four times (6 weeks)
  • Porting the software to Windows Mobile© and optimizing the computation  complexity of algorithms and database size (10 weeks)
  • Supporting Microsoft speech API interface (2 weeks)
  • Preparing  ready-to-use library packages for Windows and Windows CE platforms (2 weeks)
  • Report at FRUCT conference. Future proposals (2 weeks)

Total  required time: 24 weeks

Final deadline: 
Friday, September 7, 2012 (All day)