Tuesday, March 3, 2015

Send vocal message using TTS and Yowsup

Last time we saw how to install PicoTTS to make our Raspberry Pi speak. Using PicoTTS (or any other Text To Speech engine you like) and the Yowsup class to send media files that we already used for pictures, we will be able to send vocal messages from the RPi, instead of just text answers.

In the future, if Whatsapp will really enable audio calls (and Yowsup will support them), it would be possible to integrate also a vocal recognition module, so we could ask something to the RPi directly, with just a call and hear the answer.

What a wonderful task from such a small device!

To send vocal messages through Whatsapp it would be better to save some space. PicoTTS just creates an uncompressed audio file and this is not the best audio format to use.

So we need to convert that file to a more suitable format. My choice is the classic mp3, but you could also use aac, wma, ogg or oga files as they are supported by Whatsapp.

To convert audio files there are many softwares. I decided to use SOX as it can also process the audio files to produce more interesting voices.

Let's install it:

sudo apt-get install sox libsox-fmt-all

That's all we really need. I suggest you to check the SOX website to see all the parameters that can be used to process the audio.

Now we just create a new function in our parser script:

def SendVocal(received):
    received=received.lstrip()+" "
    os.system("pico2wave -w voice.wav \""+received+"\"")
    os.system("sox voice.wav -r 48k voice.mp3")
        stack = SendMediaStack(credential(), [(["390000000000", "voice.mp3"])])
    except ImageSent as e:
        if ("ERROR" in e.value): Answer("Error when trying to send vocal message")

The first thing is to add a space to the end of the text (and remove beginning spaces if any). This is needed because some character before a double quote ( " )  could produce an error in the shell when launching PicoTTS (for example !" gives an error).
Next we invoke pico2wave to create the audio file. Remember that you can specify different languages for the engine.

After the file has been created we execute SOX to convert it to mp3. The rest of the file is almost equal to what we did for sending an image (the ImageSent exception is called this way also for other media types, so it's not an error).

Now, instead of using the Answer() function to send the message to our smartphone, we can use the SendVocal(), so we can receive the spoken answer.

If you like you can also create a new command (e.g.: speak or tell) and just tell the RPi what you like it to speak.

We can now try to make the voice a bit more interesting... Change the command to call SOX as follows:

    os.system("sox voice.wav -r 48k voice.mp3 pitch -600")

This will change the pitch of the audio file making it more like a male speech (PicoTTS has just female voices). The used value -600 is the one that produces the voice I like most, but you can surely try different values.

Not bad, but let's see how to create a robotic voice. Change again the SOX parameters:

    os.system("sox voice.wav -r 48k voice.mp3 phaser 0.6 0.66 3 0.6 2 tremolo 50 80 echos 0.8 0.88 70 0.6 80 0.5 60 0.4")

Here is a nice female robotic voice. To make a male one, just lower the pitch as we did above.
Again you are free to try different values and other parameters, to find the perfect voice  for your Raspberry Pi.

As I already wrote, you are free to use any TTS engine you like. Some engine can save directly to mp3, so if you do not need to process the voice you can avoid the SOX steps and save also some process time.


  1. This comment has been removed by the author.

  2. I am unfamiliar with text to speech technologies (TTS). It seems to me that the only benefit may be for someone who is visually impaired or someone driving in their car that needs to read a text message but does not want to risk looking away from the road. What are some common implications of this technology (TTS)? http://www.spiritdsp.com/products/voice-video-engine/

  3. Actually TTS is just another way you can interact with a computer. You pointed up two of the main benefits of this technology (probably the most useful ones), but you can also use in other contexts.

    For example, instead of a simple tone (or whaterver you wish to use) for an incoming message o for an email you could just be informed by a voice. It would be just the same, but with a voice you could also have some more info (who wrote the message for example).
    Also this could be useful with childrens and speaking toys are quite spread in the world.

    This is not a must-have technology to interact with computer. It's just one way to do it.

  4. This solves the male voice limitation on PicoTTS! Thanks!

  5. Is there a way of interfacing with pico directly instead of calling pico2wave?
    Something like an SDK?

    1. Actually...yes Pico has an API to use the libraries directly. I never checked this, but you can find some info in the library sources (look a the previous post). Under the folder pico_resources/docs you can find the manuals.

  6. Do you know if Raspberry Pi can receive voice message ? I am trying to build a box for my 4 year old daughter to speak to me anytime she feels

    1. Using yowsup it should, but actually I never tried this. Probably you could also find some alternatives to Yowsup/Whatsapp for this task searching the internet...