Speech recognition in ROS/Linux has been has been traditionally done using projects like CMU-Sphinx or Julius. But they lack an efficient vocabulary  and is not stable. So reliable speech recognition was confined to Windows/Mac users only. Initially I was using a windows virtual  machine inside ubuntu to do speech processing, even though it was quite resource consuming. A good alternative is to use the speech recognition built into Chrome by Google. The speech samples are sent to Google’s servers for processing and they return the recognized speech and a confidence value.It is quite easy to use this possibility of speech recognition. It also offers an advantage of speaker independent recognition of speech. The only disadvantage is the delay caused in detection. It normally takes about 3 seconds for the speech to be recognized.A simple python script for speech recognition is shown below

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import shlex,subprocess,os
print " talk something"
os.system('sox -r 16000 -t alsa default recording.flac silence 1 0.1 1% 1 1.5 1%')
cmd='wget -q -U "Mozilla/5.0" –post-file recording.flac –header="Content-Type: audio/x-flac; rate=16000" -O – "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium"'
args = shlex.split(cmd)
output,error = subprocess.Popen(args,stdout = subprocess.PIPE, stderr= subprocess.PIPE).communicate()
if not error:
a = eval(output)
#a = eval(open("data.txt").read())
confidence= a['hypotheses'][0]['confidence']
print "you said: ", speech, " with ",confidence,"confidence"

view raw


hosted with ❤ by GitHub

I have also created a ROS package for speech recognition. It can be run by checking out theGithub  repo, and running  ‘rosrun gspeech gspeech.py‘. It will publish two topics: /speech and /confidence. The first one is the detected speech while the latter one is the confidence level of detection