更复杂的用户输入

这一章是分析用户输入，有点人工智能的意思了，哈哈。

用户在输入命令时，open door与open the door应当是一个意思，现在交给程序去判断。

首先得从英语组成上分析

句子由单词组成
单词与单词之间通过空格间隔
单词有动词、名词、修饰词、数字等构成
句子的意思由语法控制

所以分析一个句子，首先得将它拆分成单词，然后分析每个单词的类型，最后将其重组为指令。

获取用户输入，拆分成单词

stuff = raw_input('> ')

words = stuff.split() #返回一个列表

分析单词类型

使用(type,word)元组来保存单词类型对

first_word = ('direction','north')

second_word = ('verb','go')

sentence = [first_word,second_word]

单元测试

书中提供了测试用例，

from nose.tools import *
from EX48 import lexicon

def test_directions():
    assert_equal(lexicon.scan("north"),[('direction','north')])

    result = lexicon.scan("north south east")

    assert_equal(result,[('direction','north'),
                        ('direction','south'),
                        ('direction','east')])

def test_verbs():
    assert_equal(lexicon.scan("go"),[('verb','go')])

    result = lexicon.scan("go kill eat")

    assert_equal(result,[('verb','go'),
                        ('verb','kill'),
                        ('verb','eat')])


def test_stops():
    assert_equal(lexicon.scan("the"),[('stop','the')])

    result = lexicon.scan("the in of")

    assert_equal(result, [('stop','the'),
                        ('stop','in'),
                        ('stop','of')])

def test_nouns():
    assert_equal(lexicon.scan("bear"),[('noun','bear')])

    result = lexicon.scan("bear princess")

    assert_equal(result, [('noun','bear'),
                        ('noun','princess')])

def test_numbers():
    assert_equal(lexicon.scan('1234'),[('number',1234)])

    result = lexicon.scan("3 91234")

    assert_equal(result,[('number',3),
                        ('number',91234)])

def test_errors():
    assert_equal(lexicon.scan('ASDFADFASDF'),[('error','ASDFADFASDF')])
    result = lexicon.scan("bear IAS princess")

    assert_equal(result,[('noun','bear'),
                        ('error','IAS'),
                        ('noun','princess')])

根据测试用例写出词汇扫描器。

通过assert_equal函数可以发现

lexicon中有个带字符串参数的scan函数
词汇类型有‘direction’、'number'、'noun'、'stop'、'verb'、'error'
再增加一个名为'unkown'的类型以便收集预定词汇表中没有的单词
scan函数的返回值是一个列表，列表的元素是(type,word)元组对

词汇扫描器

应该有个预定列表来保存常用的单词和它所代表的类型

当获取用户输入后，拆分成词，与预定的词汇类型表对比获取单词类型，返回多个(type,word)元组

def scan(stuff):
    sentence = []
    directions = ['north','south','east']
    verbs = ['go','kill','eat']
    stops = ['in','of','the']
    nouns = ['bear','princess']
    numbers = [3,91234,1234]
    errors = ['IAS','ASDFADFASDF']
    words = stuff.split()

    for word in words:
        if word in directions:
            sentence.append(('direction',word))
        elif word in verbs:
            sentence.append(('verb',word))
        elif word in stops:
            sentence.append(('stop',word))
        elif word in nouns:
            sentence.append(('noun',word))
        elif word in errors:
            sentence.append(('error',word))
        elif int(word) in numbers:
            sentence.append(('number',int(word)))
        else:
            sentence.append(('unkown',word))
    return sentence

执行nosetests

damao@damao:~/Documents/ex48$ nosetests
.........
~----------------------------------------------------------------------
Ran 9 tests in 0.005s

OK

这个扫描器可以再改进。

def scan(stuff):
    sentence = []
    directions = ['north','south','east']
    verbs = ['go','kill','eat']
    stops = ['in','of','the']
    nouns = ['bear','princess']
    numbers = [3,91234,1234]
    errors = ['IAS','ASDFADFASDF']
    words = stuff.split()

    for word in words:
        try:
            intword = int(word)
            sentence.append(('number',int(word)))
        except ValueError:
            if word in directions:
                sentence.append(('direction',word))
            elif word in verbs:
                sentence.append(('verb',word))
            elif word in stops:
                sentence.append(('stop',word))
            elif word in nouns:
                sentence.append(('noun',word))
            elif word in errors:
                sentence.append(('error',word))     
            else:
                sentence.append(('unkown',word))
    return sentence


print scan("go north")
print scan("kill the princess")
print scan("eat the bear")
print scan("open the door and smack the bear in the nose")
print scan("open 1234 door")

单独运行输出效果

damao@damao:~/Documents/ex48/EX48$ python lexicon.py
[('verb', 'go'), ('direction', 'north')]
[('verb', 'kill'), ('stop', 'the'), ('noun', 'princess')]
[('verb', 'eat'), ('stop', 'the'), ('noun', 'bear')]
[('unkown', 'open'), ('stop', 'the'), ('unkown', 'door'), ('unkown', 'and'), ('unkown', 'smack'), ('stop', 'the'), ('noun', 'bear'), ('stop', 'in'), ('stop', 'the'), ('unkown', 'nose')]
[('unkown', 'open'), ('number', 1234), ('unkown', 'door')]

可以正常输入元组列表。

使用骨架目录，以一个新项目形式生成，项目名字叫EX48