美文网首页RasaNLPchatbot技术干货
基于rasa的对话系统搭建(下)

基于rasa的对话系统搭建(下)

作者: zqh_zy | 来源:发表于2018-01-12 15:37 被阅读1376次

    基于上文训练的自然理解模型,这里介绍使用rasa core 的online leaning 进行对话决策的训练,在进行对话模块训练之前,需要准备两个文件:

    • domain.yml # 定义对话所有的意图、槽、实体和系统可能采取的action
    • story.md # 对话训练语料, 开始可以用于预训练对话模型,后期加入在线学习过程中生成的语料,更新完善。

    domain文件准备

    准备领域文件,需要考虑实际场景,任务型对话的对话生成(NLG)采用模板方法简单实用,rasa core 对模板有很好的支持,同样可以在domain文件里定义, 本项目的domian文件如下:

    mobile_domain.yml:

    slots:
      item:
        type: text
      time:
        type: text
      phone_number:
        type: text
      price:
        type: text
    
    intents:
      - greet
      - confirm
      - goodbye
      - thanks
      - inform_item
      - inform_package
      - inform_time
      - request_management
      - request_search
      - deny
      - inform_current_phone
      - inform_other_phone
    
    entities:
      - item
      - time
      - phone_number
      - price
    
    templates:
      utter_greet:
        - "您好!,我是机器人小热,很高兴为您服务。"
        - "你好!,我是小热,可以帮您办理流量套餐,话费查询等业务。"
        - "hi!,人家是小热,有什么可以帮您吗。"
      utter_goodbye:
        - "再见,为您服务很开心"
        - "Bye, 下次再见"
      utter_default:
        - "您说什么"
        - "您能再说一遍吗,我没听清"
      utter_thanks:
        - "不用谢"
        - "我应该做的"
        - "您开心我就开心"
      utter_ask_morehelp:
        - "还有什么能帮您吗"
        - "您还想干什么"
      utter_ask_time:
        - "你想查哪个时间段的"
        - "你想查几月份的"
      utter_ask_package:
        - "我们现在支持办理流量套餐:套餐一:二十元包月三十兆;套餐二:四十元包月八十兆,请问您需要哪个?"
        - "我们有如下套餐供您选择:套餐一:二十元包月三十兆;套餐二:四十元包月八十兆,请问您需要哪个?"
      utter_ack_management:
        - "已经为您办理好了{item}"
    
    actions:
      - utter_greet
      - utter_goodbye
      - utter_default
      - utter_thanks
      - utter_ask_morehelp
      - utter_ask_time
      - utter_ask_package
      - utter_ack_management
      - bot.ActionSearchConsume
    
    

    rasa core 同样支持自定义action,用于查库和访问外部接口,如这里的bot.ActionSearchConsume,在bot.py里实现,模拟了查库行为:

    class ActionSearchConsume(Action):
        def name(self):
            return 'action_search_consume'
    
        def run(self, dispatcher, tracker, domain):
            item = tracker.get_slot("item")
            item = extract_item(item)
            if item is None:
                dispatcher.utter_message("您好,我现在只会查话费和流量")
                dispatcher.utter_message("你可以这样问我:“帮我查话费”")
                return []
    
            time = tracker.get_slot("time")
            if time is None:
                dispatcher.utter_message("您想查询哪个月的消费?")
                return []
            # query database here using item and time as key. but you may normalize time format first.
            dispatcher.utter_message("好,请稍等")
            if item == "流量":
                dispatcher.utter_message("您好,您{}共使用{}二百八十兆,剩余三十兆。".format(time, item))
            else:
                dispatcher.utter_message("您好,您{}共消费二十八元。".format(time))
            return []
    
    

    其中的一些utter_message的地方同样可以用actions的方式定义在domain文件了,这里的demo意在更全面的覆盖框架的用法。

    story文件准备

    Story 文件中包含各种各样的对话流程,用于训练对话管理模型,由于缺少真实的对话数据,一开始往往采用其他方法收集数据(例如从传统对话管理方法的系统中获取真实数据)。这里使用通过与机器对话的方式来生成对话语料,教机器什么时候说什么话。只能感叹 数据的准备(标注)和完善是一个消耗人力物力的工作。

    一个story的例子:

    ## Story 12132
    * greet
        - utter_greet
    * request_search{"time": "上个月", "item": "消费"}
        - action_search_consume
        - utter_ask_morehelp
    * deny
        - utter_goodbye
        - export
    

    具体story的格式要求参考官方文档 ,下面通过初始对话语料,测试强化学习(online learning),并生成对话语料。

    online learning

    运行bot.py脚本:

    python bot.py online_train
    

    同样是调用了rasa core 的 Python接口,具体代码可以在项目代码中看到。从输出中可以看到其使用的kerse 网络结构(单层的LSTM):

    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    masking_1 (Masking)          (None, 2, 31)             0         
    _________________________________________________________________
    lstm_1 (LSTM)                (None, 32)                8192      
    _________________________________________________________________
    dense_1 (Dense)              (None, 11)                363       
    _________________________________________________________________
    activation_1 (Activation)    (None, 11)                0         
    =================================================================
    Total params: 8,555
    Trainable params: 8,555
    Non-trainable params: 0
    
    

    接着就开始了对话教学之旅,以其中一部分为例:

    
    Bot loaded. Type a message and press enter : 
    您好
    ------
    Chat history:
            bot did:        None
            bot did:        action_listen
            user said:      您好
            whose intent is:       greet
    
    we currently have slots: item: None, phone_number: None, price: None, time: None
    ------
    The bot wants to [utter_greet] due to the intent. Is this correct?
    
            1.      Yes
            2.      No, intent is right but the action is wrong
            3.      The intent is wrong
            0.      Export current conversations as stories and quit
    1
    你好!,我是小热,可以帮您办理流量套餐,话费查询等业务。
    ------
    Chat history:
    
            bot did:        action_listen
            user did:       greet
            bot did:        utter_greet
    we currently have slots: item: None, phone_number: None, price: None, time: None
    ------
    The bot wants to [action_listen]. Is this correct?
            1.      Yes.
            2.      No, the action is wrong.
            0.      Export current conversations as stories and quit
    1
    Next user input:
    帮我看看上个月的话费
    ------
    Chat history:
    
            bot did:        utter_greet
            bot did:        action_listen
            user said:      帮我看看上个月的话费
            whose intent is:       request_search
            with time:      上个月
            with item:      消费
    we currently have slots: item: 消费, phone_number: None, price: None, time: 上个月
    
    ------
    The bot wants to [utter_ask_time] due to the intent. Is this correct?
    
            1.      Yes
            2.      No, intent is right but the action is wrong
            3.      The intent is wrong
            0.      Export current conversations as stories and quit
    2
    ------
    Chat history:
    
            bot did:        utter_greet
    
            bot did:        action_listen
    
            user said:      帮我看看上个月的话费
    
                     whose intent is:       request_search
    
            with time:      上个月
    
            with item:      消费
    
    we currently have slots: item: 消费, phone_number: None, price: None, time: 上个月
    
    ------
    what is the next action for the bot?
    
             0                           action_listen    0.05
             1                          action_restart    0.00
             2                             utter_greet    0.01
             3                           utter_goodbye    0.02
             4                           utter_default    0.00
             5                            utter_thanks    0.02
             6                      utter_ask_morehelp    0.04
             7                          utter_ask_time    0.68
             8                       utter_ask_package    0.04
             9                    utter_ack_management    0.01
            10                   action_search_consume    0.12
    10
    thanks! The bot will now [action_search_consume]
     -----------
    好,请稍等
    您好,您上个月共消费二十八元。
    ------
    ...
    ...
    

    由于训练是在线进行,模型实时更新,对话结束后可以选择export将内容导出到指定文件。

    The bot wants to [action_listen]. Is this correct?
    
            1.      Yes.
            2.      No, the action is wrong.
            0.      Export current conversations as stories and quit
    0
    File to export to (if file exists, this will append the stories) [stories.md]: 
    INFO:rasa_core.policies.online_policy_trainer:Stories got exported to '/home/zqh/mygit/rasa_chatbot/stories.md'.
    
    

    可以将新生成的stories.md有选择的加到data/mobile_story.md下 ,并使用新生成的mobile_story.md文件重新训练模型:

    python bot.py train-dialogue
    

    得到models目录下对话管理模型:

    models/
    ├── dialogue
    │   ├── domain.json
    │   ├── domain.yml
    │   ├── featurizer.json
    │   ├── keras_arch.json
    │   ├── keras_policy.json
    │   ├── keras_weights.h5
    │   ├── memorized_turns.json
    │   └── policy_metadata.json
    └── ivr
        ├── demo
        ├── entity_extractor.dat
        ├── entity_synonyms.json
        ├── intent_classifier.pkl
        ├── metadata.json
        └── training_data.json
    
    

    测试对话管理

    虽然对话数据有限,模型不稳定,我们仍然可以测试训练的模型:

    python bot.py run
    

    一个测试结果如下:

    Bot loaded. Type a message and press enter : 
    我想查一下我的话费
    您想查询哪个月的消费?
    四月的
    好,请稍等
    您好,您四月共消费二十八元。
    您还想干什么
    

    当然更加复杂的对话需要更多的对话语料,不过,至此,整个过程的演示基本完成。

    小结

    使用rasa 构建对话系统比较方便,可以感觉到系统的好坏大程度的依赖于数据的质量与数量。对于项目的对话管理部分仍然有很多地方需要改进,比如一个任务完成后状态的reset;加入简单的闲聊与任务的区分,提高用户体验。

    原创文章,转载注明出处。

    更多关注公众号:


    wechat

    相关文章

      网友评论

      • 8eca3d96e923:请问python bot.py online_train命令用不了该怎么该呢?报错信息如下:
        Traceback (most recent call last):
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_nlu\model.py", line 66, in load
        data = utils.read_json_file(metadata_file)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_nlu\utils\__init__.py", line 208, in read_json_file
        content = read_file(filename)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_nlu\utils\__init__.py", line 202, in read_file
        with io.open(filename, encoding=encoding) as f:
        FileNotFoundError: [Errno 2] No such file or directory: 'models/ivr/demo\\metadata.json'

        During handling of the above exception, another exception occurred:

        Traceback (most recent call last):
        File "bot.py", line 120, in <module>
        interpreter=RasaNLUInterpreter("models/ivr/demo"),
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\interpreter.py", line 221, in __init__
        self._load_interpreter()
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\interpreter.py", line 237, in _load_interpreter
        self.interpreter = Interpreter.load(self.model_directory)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_nlu\model.py", line 270, in load
        model_metadata = Metadata.load(model_dir)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_nlu\model.py", line 71, in load
        "from '{}'. {}".format(abspath, e))
        rasa_nlu.model.InvalidProjectError: Failed to load model metadata from 'D:\基于rasa的对话系统4\_rasa_chatbot-master\models\ivr\demo\metadata.json'. [Errno 2] No such file or directory: 'models/ivr/demo\\metadata.json'
        8eca3d96e923:@zqh_zy 大致修改之后会出现这样的报错:
        Using TensorFlow backend.
        Traceback (most recent call last):
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\pykwalify\core.py", line 76, in __init__
        self.source = yaml.load(stream)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\ruamel\yaml\main.py", line 637, in load
        loader = Loader(stream, version, preserve_quotes=preserve_quotes)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\ruamel\yaml\loader.py", line 46, in __init__
        Reader.__init__(self, stream, loader=self)
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\ruamel\yaml\reader.py", line 80, in __init__
        self.stream = stream # type: Any # as .read is called
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\ruamel\yaml\reader.py", line 125, in stream
        self.determine_encoding()
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\ruamel\yaml\reader.py", line 169, in determine_encoding
        self.update_raw()
        File "C:\Users\Cf\AppData\Local\Programs\Python\Python36\lib\site-packages\ruamel\yaml\reader.py", line 278, in update_raw
        data = self.stream.read(size)
        UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 418: illegal multibyte sequence

        During handling of the above exception, another exception occurred:
        8eca3d96e923:@zqh_zy 上一个nlu模型官网是这样写:
        python -m rasa_nlu.train -c nlu_config.yml --data data/nlu_data.md -o models
        --fixed_model_name nlu --project current --verbose
        然后这样修改:
        python -m rasa_nlu.train --config mobile_nlu_model_config.json --data data/mobile_nlu_data.json --path models
        对话模型是:
        python -m rasa_core.train -d domain.yml -s data/stories.md -o models/current/dialogue --epochs 200
        该如何修改呢?
        zqh_zy:@无情Array 接口变化了rasa core,我还没来得及修改,你可以看下rasa core的文档尝试修改先
      • qianzeng:你好,能否分享一下源码?github上没找到你的邮箱,这是我的邮箱:z_orochimaru@outlook.com

      本文标题:基于rasa的对话系统搭建(下)

      本文链接:https://www.haomeiwen.com/subject/tmyfoxtx.html