Blog

TTS - How to go about it?

First off, the title — you might have heard the term 'TTS' a few times now-a-days. If you were on a vacation to Mars and didn't understand what the title meant, it's high time you do. TTS stands for Text-to-speech, and similarly, STT is Speech-to-text. It's simply an artificial production of human speech. If you just wish to get the API, skip a few paragraphs to reach the Magic.
 
Long before electronic signal processing was invented, there were those who tried to build machines to create human speech. Some early legends of the existence of "speaking heads" involved Gerbert of Aurillac (d. 1003 AD), Albertus Magnus (1198–1280), and Roger Bacon (1214–1294). If you want to read history, check out the article on Wikipedia.
 
The process to produce artificial speech is known as speech synthesis. A TTS engine generally comprises of two parts — obviously, the front end and the back end. The front end has two important parts — it converts raw text which contains symbols into equivalent written words, and it assigns phonetic transcriptions (symbolic representation of sound) to each word and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The two processes are called text normalization and text-to-phoneme conversion respectively. The back end, which is called the synthesizer, then converts the symbolic representation into sound. In some systems, the back end does the computation of target prosody (assigning the pitch, amplitude, duration, etc.) too, which is then imposed on the output speech. In the following, both, the front end and the back end are already done; we just need to feed the text and we're good to go.
 
Magic
 
I've often found myself at a position where I really need to use dictation, but I always fail to find a nice resource. Finally, I found this really cool library which Google uses for its translation service. It's not open source, but we can still implement to our work; works pretty fine if you need 200 words or less. The following is how to use it.
 


 
This will simply speak 'hello world.' Change the GET value of 'q' to make it speak what you wish. If you're using a browser like Chrome, you can simply iframe the page too.
 
Code is poetry.   
  

More By  :  Anand Chowdhary

  • Views: 2096
  • Comments: 0





Name *
Email ID
 (will not be published)
Comment
Verification Code*

Can't read? Reload

Please fill the above code for verification.