Welcome, guest | Sign In | My Account | Store | Cart

This is an example of using the Microsoft Speech SDK 5.1 under Windows for command and control speech recognition in Python.

Python, 67 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
from win32com.client import constants
import win32com.client
import pythoncom

"""Sample code for using the Microsoft Speech SDK 5.1 via COM in Python.
    Requires that the SDK be installed; it's a free download from
            http://microsoft.com/speech
    and that MakePy has been used on it (in PythonWin,
    select Tools | COM MakePy Utility | Microsoft Speech Object Library 5.1).

    After running this, then saying "One", "Two", "Three" or "Four" should
    display "You said One" etc on the console. The recognition can be a bit
    shaky at first until you've trained it (via the Speech entry in the Windows
    Control Panel."""
class SpeechRecognition:
    """ Initialize the speech recognition with the passed in list of words """
    def __init__(self, wordsToAdd):
        # For text-to-speech
        self.speaker = win32com.client.Dispatch("SAPI.SpVoice")
        # For speech recognition - first create a listener
        self.listener = win32com.client.Dispatch("SAPI.SpSharedRecognizer")
        # Then a recognition context
        self.context = self.listener.CreateRecoContext()
        # which has an associated grammar
        self.grammar = self.context.CreateGrammar()
        # Do not allow free word recognition - only command and control
        # recognizing the words in the grammar only
        self.grammar.DictationSetState(0)
        # Create a new rule for the grammar, that is top level (so it begins
        # a recognition) and dynamic (ie we can change it at runtime)
        self.wordsRule = self.grammar.Rules.Add("wordsRule",
                        constants.SRATopLevel + constants.SRADynamic, 0)
        # Clear the rule (not necessary first time, but if we're changing it
        # dynamically then it's useful)
        self.wordsRule.Clear()
        # And go through the list of words, adding each to the rule
        [ self.wordsRule.InitialState.AddWordTransition(None, word) for word in wordsToAdd ]
        # Set the wordsRule to be active
        self.grammar.Rules.Commit()
        self.grammar.CmdSetRuleState("wordsRule", 1)
        # Commit the changes to the grammar
        self.grammar.Rules.Commit()
        # And add an event handler that's called back when recognition occurs
        self.eventHandler = ContextEvents(self.context)
        # Announce we've started using speech synthesis
        self.say("Started successfully")
    """Speak a word or phrase"""
    def say(self, phrase):
        self.speaker.Speak(phrase)


"""The callback class that handles the events raised by the speech object.
    See "Automation | SpSharedRecoContext (Events)" in the MS Speech SDK
    online help for documentation of the other events supported. """
class ContextEvents(win32com.client.getevents("SAPI.SpSharedRecoContext")):
    """Called when a word/phrase is successfully recognized  -
        ie it is found in a currently open grammar with a sufficiently high
        confidence"""
    def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
        newResult = win32com.client.Dispatch(Result)
        print "You said: ",newResult.PhraseInfo.GetText()
    
if __name__=='__main__':
    wordsToAdd = [ "One", "Two", "Three", "Four" ]
    speechReco = SpeechRecognition(wordsToAdd)
    while 1:
        pythoncom.PumpWaitingMessages()

Python is a natural choice for a speech recognition control application, since it's very easy to support user scripting.

I have a simple voice recognition application based on the above code, that sits in the system tray and runs short chunks of Python script via exec when it recognizes a word. I've found the Windows Scripting Host particularly useful, particularly the SendKeys method: eg

shell = win32com.client.Dispatch("WScript.Shell")
shell.SendKeys({PGUP})

mapped to saying "Page up", and so on. (The full code for the GUI version is on my web page at http://inigo.0catch.com - it uses wxWindows)

7 comments

Dirk Krause 22 years, 2 months ago  # | flag

Program runs, but aborts? Isn't there a loop missing? The program starts, a voice says 'Started successfully' and then the program ends.

Inigo Surguy (author) 22 years, 1 month ago  # | flag

Now fixed. Yes, you're right. It worked fine running under PythonWin, but needed a message loop running from the command line. I've now added it.

Andre Honsberg 16 years, 2 months ago  # | flag

wxPython Version. Can you please tell what version of wxPython you use because the one I installed keeps giving me errors in the time conversion. Maybe I have the wrong wxPython version or am using a newer one. Thanks in Advance

Andre

George LeCompte 16 years ago  # | flag

Program aborts. Downloaded code and Microsoft Speech SDK 5.1 to same file. Atempted to run code.

Traceback (most recent call last):

File "L:\SpeechRecognition\Speech.py", line 55, in class ContextEvents(win32com.client.getevents("SAPI.SpSharedRecoContext")):

TypeError: Error when calling the metaclass bases

cannot create 'NoneType' instances

>

I'm using Python 2.5.1 on Windows XP system.

Any thoughts on speech recognition?

Michael 15 years, 7 months ago  # | flag

I've made a somewhat cleaner module for working with Microsoft Speech Recognition, based on Surguy's example above.

The 'speech' module is available by typing 'easy_install speech' at the Windows command prompt (if you've got easy_install installed.) You'll still need to do the Speech SDK install and run MakePY, but I'm trying to figure out how to elimiate the MakePY step :)

The project lives at http://pyspeech.googlecode.com and is on PyPI.

The code looks like this:

import speech

# a callback to run whenever a certain phrase is heard.
def command_callback(phrase, listener):
    speech.say("You said %s" % phrase) # speak out loud
listener1 = speech.listenfor(['some', 'various phrases', 'to listen for'],
        command_callback)

# a callback to run when any English is heard.
def dictation_callback(phrase, listener):
    if phrase == "stop please":
        speech.stoplistening(listener)
    else:
        print "Heard %s" % phrase
listener2 = speech.listenforanything(dictation_callback)

# Both listeners are running right now.  When the user says
# "stop please", listener2 will stop itself in the callback.
while speech.islistening(listener2):
    speech.pump_waiting_messages() # safe to call in a tight loop

# Turn off listener1 as well.
speech.stoplistening(listener1)

It works great -- I've successfully used the module to build a music robot that understands instructions like "Play me some Radiohead, any album."

Hope this helps! Feedback at pyspeech.googlecode.com would be appreciated.

Michael

Michael 15 years, 7 months ago  # | flag

I've released speech v0.3.5, which simplifies the above code somewhat. You no longer call pump_waiting_messages(), as that is taken care of for you in the background. Just sleep as long as you wish the program to keep running, or do whatever other tasks you wish. Also, listeners now have a stoplistening() and islistening() method rather than calling speech.islistening(listener).

The revised code:

import speech

# a callback to run whenever a certain phrase is heard.
def command_callback(phrase, listener):
    speech.say("You said %s" % phrase) # speak out loud
listener1 = speech.listenfor(['some', 'various phrases', 'to listen for'],
    command_callback)

# a callback to run when any English is heard.
def dictation_callback(phrase, listener):
    if phrase == "three":
        listener.stoplistening()
    else:
        print "Heard %s" % phrase
listener2 = speech.listenforanything(dictation_callback)

import time
while listener2.islistening():
    time.sleep(.5)

# Turn off listener1 as well.
listener1.stoplistening()
MOhammad Aliannejadi 10 years, 11 months ago  # | flag

Thanks for the code. I have tried it on Windows 7 with Python 3.3 and pywin build 218 but I get this exception, please help me with this, thanks.

class ContextEvents(win32com.client.getevents("SAPI.SpSharedRecoContext")): TypeError: NoneType takes no arguments

Created by Inigo Surguy on Sun, 2 Dec 2001 (PSF)
Python recipes (4591)
Inigo Surguy's recipes (1)

Required Modules

Other Information and Tasks