About UsApplicationsCoursesResearchConsultingBusinessTikiWikiContact us

 

Speech Synthesis

There is a discussion of new speech synthesis algorithm ideas in the Wiki discussion area.

There is a do-it-yourself speech synthesis course, but, as the name implies, you may also try to do on your own with a little help from ICISLT.  Some parts of a standard speech synthesis system are very complicated, so you may want to avoid attempting them as a one-person project.  For example, the signal processing used to adjust the pitch of a segment of speech is not only complex, but it noticeably degrades the quality of the speech if it is not done just right.  A workaround, which requires time and effort, but which is not complicated, is simply to record enough speech in your voice so that an appropriate sample of each segment of speech can be found with the pitch of the original already close to the desired pitch for the segment to be synthesized.

The steps in building a simplified speech synthesis system for your own voice are as follows:

  1. Record a large amount of speech with known script.

    •  It will take a minimum of 10 hours of recorded data, 100 hours is recommended (read several novels out loud -- you should use books for which the text is available in electronic form).

    • Given a phonetic dictionary (for some languages the spelling is phonetic), you can train a speaker-dependent recognizer from scratch even in a language in which no large vocabulary speech recognizer currently exists.

  2. Train a speaker-dependent speech recognizer using this data.

    • Although their are automated procedures (e.g. Sphinx Trainer) for training a speech recognizer from scratch, some skill and experience is needed to make sure the process is successful.

  3.  Diagnose errors in the alignment computed between the record speech and the script.

    • ICISLT is developing tools to aid in this diagnostic process.

  4. Develop a program to decide which pronunciation to use when more than one pronunciation is consistent with text for which speech is to be synthesized.

  5. Develop a program that, given a phonetic sequence to be synthesized, finds the best matching sequence of segments in the recorded speech.

  6. Develop a program to smoothly concatenate the chosen segments.

If you contribute your recordings to our data corpus, ICISLT will help you with the other steps.

     
 

Copyright © 2005 James K. Baker