Speech recognition in ROS/Linux has been has been traditionally done using projects like CMU-Sphinx or Julius. But they lack an efficient vocabulary and is not stable. So reliable speech recognition was confined to Windows/Mac users only. Initially I was using a windows virtual machine inside ubuntu to do speech processing, even though it was quite resource consuming. A good alternative is to use the speech recognition built into Chrome by Google. The speech samples are sent to Google’s servers for processing and they return the recognized speech and a confidence value.It is quite easy to use this possibility of speech recognition. It also offers an advantage of speaker independent recognition of speech. The only disadvantage is the delay caused in detection. It normally takes about 3 seconds for the speech to be recognized.A simple python script for speech recognition is shown below
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
import shlex,subprocess,os | |
print " talk something" | |
os.system('sox -r 16000 -t alsa default recording.flac silence 1 0.1 1% 1 1.5 1%') | |
cmd='wget -q -U "Mozilla/5.0" –post-file recording.flac –header="Content-Type: audio/x-flac; rate=16000" -O – "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium"' | |
args = shlex.split(cmd) | |
output,error = subprocess.Popen(args,stdout = subprocess.PIPE, stderr= subprocess.PIPE).communicate() | |
if not error: | |
a = eval(output) | |
#a = eval(open("data.txt").read()) | |
confidence= a['hypotheses'][0]['confidence'] | |
speech=a['hypotheses'][0]['utterance'] | |
print "you said: ", speech, " with ",confidence,"confidence" |
I have also created a ROS package for speech recognition. It can be run by checking out theGithub repo, and running ‘rosrun gspeech gspeech.py‘. It will publish two topics: /speech and /confidence. The first one is the detected speech while the latter one is the confidence level of detection
When I run the ‘rosrun gspeech gspeech.py ‘ it starts to record audio and never stops