Introduction: Introduction to Pocketsphinx for Voice Controled Applications
Have you ever wanted to try creating a voice activated application, but were concerned about not always having Internet connection or about what you were saying being spied on? Have you ever wanted to address a voice assistant by a name of your choosing?
Rest easy, in this Instructable I will guide you through installing, calibrating and configuring a powerful Open Source Speech-to-Text(STT) engine you can use to build own voice controlled applications that can run without Internet connection and are spy-ware free! Additionally I've included a voice assistant template program in my 'helperscripts' repository. (Short video demo below and last step.)
Step 1: Further Introduction and Parts
What you will need; a Ubuntu-based Linux Machine with Internet access, audio out and audio in. I will be using my headphones(Output) and a PS-Eye Webcam(Input) on my Linux Peppermint laptop. The instructions and tools should be generic enough so that you can apply it to almost any machine fitting the above criteria. For input you could use just about any webcam or even your laptop microphone(Though I would not recommend it). I will expect users to be somewhat experienced with using the terminal to move through directories, run scripts and understand some basic Python concepts.
If you enjoy my project please vote for it in the Voice-Activation-Competition. If you complete it or use the STT to create something please post about it or your other projects below I'd love to hear about it! Thank you!
DISCLAIMER: I will not provide you any support for the following Instructable on either Windows or OSX operating systems as well as not on Raspberry Pis or other SBCs, though you can easily apply this tutorial to any of those platforms. I simply do not have the expertise to help you with the problems you might encounter using those.
I CAN help you troubleshoot issues from the scripts and programs in my Github on the condition that you have already supplied your best efforts (I.E. Googled your problem). If you have questions surrounding PocketSphinx it would be best to take them to the forums: https://sourceforge.net/p/cmusphinx/discussion/ or their respective Github: https://sourceforge.net/p/cmusphinx/discussion/ Thank you!
Step 2: Installing Python Pocketsphinx and Libraries
Firstly you will need to download the latest versions of pocketsphinx-python and my helperScripts from their Github pages. The Pocketsphinx and Sphinx libraries will be downloaded and placed within the Python library automatically. Create a new directory for your project, move into it and put the following commands into your terminal:
$ git clone --recursive https://github.com/cmusphinx/pocketsphinx-python.git
$ git clone https://github.com/malceore/helperScripts.git
Then we will need to make sure you have the packages needed to run the scripts and compile the source. You can use the following commands to install them:
$ sudo apt-get update
$ sudo apt-get install -y python git python-dev python-pip bison libasound2-dev libportaudio-dev python-pyaudio autoconf libtool automake gfortran g++ build-essential swig tree
Finally we will have to compile and install Pocketsphinx and Sphinxbase by running the following commands. Note that you will need sudo user privileges:
$ cd pocketsphinx-python/
$ cd sphinxbase/
$ ./autogen.sh && ./configure
$ sudo make install
$ cd ../pocketsphinx/
$ ./autogen.sh && ./configure
$ make clean all
$ sudo make install
$ cd ../ && sudo python setup.py install
You should now have both Pocketsphinx and Shinx-base installed into your ‘/usr/local’ directory and be ready for the next steps! Great job!
Step 3: Testing and IO Calibrations
Now you can move back into the directory you created and into the ‘helperScripts’ directory that we cloned earlier. Here are handful of scripts I've compiled to help you get up and running, while also working as examples to help you get started making your own voice controlled applications!
Before we get started please make sure your audio input (Microphone) and output (Headphones) are working properly. After that we can run the audio_test.sh script:
$ chmod +x audio_test.sh $ ./audio_test.sh
This script will print out all the audio sources available, attempt to record a snippet of audio from the default input device, and then play it back to you to see how it sounds. This is useful for sanity checks and first time calibration of new audio devices.
If you encounter errors or do not hear any audio first check using another source that your audio IO is functioning correctly. You may have to reconfigure your ALSA drivers which control your audio before you can proceed. It may just be that your audio device is not being selected as the default or that ALSA’s ‘amixer’ has the volume muted by default. Please see: https://wiki.archlinux.org/index.php/Advanced_Lin... and https://wiki.archlinux.org/index.php/Advanced_Lin...
Next we will test that files needed are in order and Python can use the library to transcribe audio. Please run:
$ python python_test.py
You should see a “Test Successful” printed out at the bottom to confirm that it completed successfully and without error. You now have everything configured to begin making your own voice controlled applications. You can find further tests and sample code in: ‘../pocketsphinx-python/pocketsphinx/swig/python/test/’
If you encounter errors about ‘jack server unavailable’ this is a common false positive error generated by Pyaudio, Python’s audio library. If you encounter other Pyaudio based errors then you can try running the ‘pyaudio_test.py’ script and try listening to the file it records. Otherwise I found hit Google.
Step 4: Explanations and Generating Language Models
So what exactly is happening here?
Basically this Python library interfaces with Pocketsphinx, an open source toolkit and SST engine written in C by researchers. It decodes the audio sent to it by file or stream looking for phonomones and mononomes that match what is contained in it’s Dictionary and Language models. In this way it can perform keyword searches and transcriptions of spoken audio.
It is known that the accuracy for the base language model and dictionary are not great. This is because it has to consider thousands of words and phrases for each utterance given to it. The upside to this is that we can generate our own language models that narrow it down drastically. I have included two alternative language models and dictionaries that you can use in the ‘helperScripts/lang_model’. One only includes the words “hello” and “world,” while the other has a handful of words that might be nice to automate your computer or to build a virtual assistant with. See ‘assistant.vocab’ for a full list.
To use these or test your own you can edit the ‘test_lm.py’ script in 'helperScripts' to point at your own model’s absolute positions instead.
config.set_string('-lm', 'lang_models/assistant.lm') config.set_string('-dict', 'lang_models/assistant.dic')
config.set_string('-lm', ' your_model_file_name') <br>config.set_string('-dict', 'your_dictionary_file_name')
To generate your own models you can create a text file with each phrase you want to listen for on a different line. Then you can upload your text file to: http://www.speech.cs.cmu.edu/tools/lmtool-new.htm...
It should generate a ‘.lm’ and ‘.dict’ file for you that you can download and use. In ‘helperScripts’ I have included a very simple always-listening voice assistant template that you can copy and modify for your own needs. It is an endless loop that waits for ambient volume to reach a certain threshold. When it does it will listen for a ‘hotword’, which is currently set as “assistant.” If it believes that word was spoken it will audibly ‘beep’ to alert the user that is is listening for ‘orders’. It will then listen for any of the keywordsfor another short duration and print out the word it thinks it heard.
You can easily modify the code so that instead of returning the word it could make a system call or run an application that you specify. Have fun with it!
Step 5: Summary and Future Plans
I hope you enjoyed this tutorial. In it you have learned how to install Pocketsphinx the Open source STT Toolset, and also how to utilize some of it’s capabilities using Python. Please feel free to use the code in ‘helperScripts’ to build your own projects and show them off. I've attached a video and linked one of me making a small modification to the voice assistant script to control my web-app controlled lights which you can figure out how to build in my other instructable:
I wrote this because it took me a few months of on-and-of tinkering/reading/experimenting to get something usable as there was little in the way of documentation. I would like to thank the teams and community that contributes to Pocketsphinx as well as Github for hosting the code necessary, and Stackoverflow for being a great resource. Here are some other resources and Instructables to help you on your way:
As for future ideas I plan on building my own voice-assistant, embedded in a web service utilizing Pocketsphinx for STT. The idea is that multiple devices could connect and drop whenever to the same assistant. This project will feature chatbot like abilities, features for answering queries using a datasource, a visually animated character and be highly extensible as to work with other projects such as: homeassistant.io, MQTTS, etc. Please keep an eye out for a future Instructable on setting that up for yourself too!