Samurai Hippo Labs home projects documents journal

ofxSpeech

A Speech Recognition and Synthesis Addon for openFrameworks

oF + Mac Speech = ofxSpeech from latrokles on Vimeo.

ofxSpeeh is a simple library to handle speech recognition and synthesis tasks in openFrameworks applications. It is a wrapper around the Carbon Speech APIs in Mac OS X, though there are plans to port it to Linux and Windows in the future (timeframe on this is unclear at the moment).

The library requires oF 006 since it uses the new event system and – at the moment – only works on the Mac version of openFrameworks.

Features:

- speech recognition of predefined words (these can be added from a list in the program or loaded from a textfile).
- start and stop the speech recognition process.
- speech synthesis with a variety of voices (those supported by your particular version of OS X).
- speech synthesis in spelling mode (character by character or digit by digit).
- puse, continue, and stop the speech synthesis process.

Documentation:

Installation

Usage

ofxSpeechRecognizer

To use the speech recognition component of ofxSpeech you need to have an instance of ofxSpeechRecognizer in your testApp.h

ofxSpeechRecognizer myRecognizer;

Then you will have to perform some subtle additions to your testApp.h and testApp.cpp

In your testApp.h you must add the following method signature:

void speechRecognized(string & wordRecognized);.

In your testApp.cpp you must add the implementation for the method above and add the following line in your testApp::setup():

ofAddListener(myRecognizer.speechRecognizedEvent, this, &testApp::speechRecognized);.

Once this is done you can use the recognizer as described by its API:

void initRecognizer();
Initializes the speech recognition system.

void loadDictionary(const std::vector<std::string> &wordsToRecognize);
Loads a list of words that you want the system to recognizer, this list is in the form of a vector of strings.

void loadDictionaryFromFile(std::string dictionaryFilename);
Same as above, but it loads the words from a textfile located in your oF program’s data directory. You may define a word or sentence by line. Keep in mind that if you define a sentence, the speech recognition engine will only recognize that sentence, but not individual words in it (unless you have defined those in a separate line).

void startListening();
Tells the recognition system to start listening for words.

void stopListening();
Tells the recognition system to stop listening for words.

bool isListening();
Returns the status of the recognition system.

ofxSpeechSynthesizer

The speech recognizer is much more straighforward as there are no callback functions of any kind. To use it you must simply need an instance of ofxSpeechSynthesizer in your testApp.h

ofxSpeechSynthesizer mySynthesizer;

After that, you can just use the system as described by the following API:

void initSynthesizer(std::string voice="");
Initiallize the speech synthesizer with a voice. If no voice is selected, then the synthesizer starts with the default system voice. You can use listVoices() to see the voices availabl in your system.

void selectVoice(std::string voice);
You can select a new voice for the system (not implemented yet) coming soon

std::map<std::string, int> getListOfVoices();
Returns a map containing a list of voices and their index number in the system. (you may not need to use this ever).

std::string getCurrentVoice();
Returns a string with the name of the currently selected voice.

void listVoices();
Displays a list of the available voices in the system. This list will be displayed in standard outupt so you will have to look at the output in Xcode’s console.

void speakPhrase(std::string phraseToSpeak);
Uses the established speech channel to speak a phrase or text.

void pauseSpeaking();
Pauses speaking immediatly, even if it’s in the middle of a sentence or word. It can be resumed with continueSpeaking().

void stopSpeaking();
Stops speaking completely. It cannot be resumed, so whatever was in the speech channel is lost.

void continueSpeaking();
Continues speaking from the point at which pauseSpeaking was called.

void setDigitByDigit(bool enabled);
If true it will spell numbers in a digit-by-digit fashion (i.e. 100 would be “one zero zero”), otherwise it will speak the number as it is (i.e. 100 would be “one hundred”).

void setCharacterByCharacter(bool enabled);
Sets the synthesizer so that it spells out words, character by character. This setting automatically sets digit-by-digit spelling as well.

Download

ofxSpeech Github Repository