人机对话系统 (1)

作者: zidea | 来源:发表于2019-07-20 21:20 被阅读12次

人机对话系统 (1)
人机对话系统(3)
人机对话系统(2)
人机对话系统简介
人机对话系统实现大纲
人机对话简述
对话系统的概述
任务驱动型人机对话系统
Pexpect 模块使用说明
谈任务驱动型人机对话系统

人机对话

NLP（natural language process）自然语言处理是机器学习的一部分 google 和百度在机器翻译上在近几年都宣称取得巨大成绩，当我打开必应（bing）搜索资料时候也喜欢和微软的 chatbot 聊一聊。

import nltk
from nltk.stem.lancaster import LancasterStemmer

import numpy
import tflearn
import tensorflow
import random
import json

with open("intents.json") as file:
    data = json.load(file)
print(data)

准备数据

{'intents': [{'tag': 'greeting', 'patterns': ['Hi', 'How are you', 'Is anyone there?', 'Hello', 'Good day', 'Whats up'], 'response': ['Hello!', 'Good to see you again', 'Hi there, how can i help?'], 'context_set': ''}]}

数据格式为 patterns 使我们输入内容，是用户发起的我们对其进行汇总，而 response 是 chatbot 根据用发起返回信息。通过这些我们来训练我们 chatbot 模型。大家看到这些会感觉这不就是根据内容进行搜索答案吗，其实不然训练后 chatbot 会根据内容，即使不在这里也能够做出与问题相匹配的应答。
大家注意到我们为每一个 intent 打上了标签（tag) 这是 chatbot 会根据用户语言对其进行分类判断出用户内容属于哪一个标签。

准备开发环境

因为 tflearn 在 python 3.7 有些问题，这里 Anaconda 创建一个纯净 python3.6 的环境来开发我们应用。
在官网成功安装 Anaconda 后，在命令行运行下面命令即可

conda create -n chatbot python=3.6

然后激活我们的 Anaconda 环境来进行在 python 3.6 下开发应用

activate chatbot

然后就是进行安装所需要依赖，第一个是 nltk 一个自然语言处理集合

pip install nltk

然后我们还需要安装 TensorFlow 和 tflearn ，其中 tflearn 是基于 TensorFlow 上提供高级 api ，来让开发者更容易地开发机器学习的系统。

开始开发

import nltk
from nltk.stem.lancaster import LancasterStemmer

stemmer = LancasterStemmer()

import numpy
import tflearn
import tensorflow
import random
import json
import pickle

with open("intents.json") as file:
    data = json.load(file)
    print(data)

首先输出一下我们的数据，从 json 文件中获取数据。
接下来要做的事将 patterns 内容分别出是哪一个标签（tag）下。

words = []
labels = []
docs = []

for intent in data["intents"]:
    for pattern in intent["patterns"]:
        wrds = nltk.word_tokenize(pattern)
        print(wrds)

首先我们需要通过 nltk 提供抽取单词，将每一个 pattern（话）转换为单词结构的集合
输出

['Hi']
['How', 'are', 'you']
['Is', 'anyone', 'there', '?']
['Hello']
['Good', 'day']
['Whats', 'up']
['cya']
['see', 'you', 'later']
['Goodbye']
['I', 'am', 'Leaving']
['Have', 'a', 'Good', 'day']
['how', 'old']
['how', 'old', 'is', 'tim']
['Goodbye']

words.extend(wrds)

然后把所有抽出单词放置到 words 数组中去，这里简单说一下 append 和 extend 区别
list.append(object) 向列表中添加一个对象object

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3]

l1.append(l2)
print(l1)

输出为

[1, 2, 3, 4, 5, [1, 2, 3]]

list.extend(sequence) 把一个序列seq的内容添加到列表中

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3]

l1.extend(l2)
print(l1)

输出为

[1, 2, 3, 4, 5, 1, 2, 3]

接下来将 tag 数据保存在 labels 中

words = []
labels = []
docs = []

for intent in data["intents"]:
    for pattern in intent["patterns"]:
        wrds = nltk.word_tokenize(pattern)
        words.extend(wrds)
        docs.append(pattern)

    if intent["tag"] not in labels:
        labels.append(intent["tag"])

通过上面代码我们完成将 intent 中句子保存在 docs，将单词保存在 words 而在 tag 保存在labels 中的任务。

网友评论

本文标题：人机对话系统 (1)

本文链接：https://www.haomeiwen.com/subject/twdskctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

人机对话系统 (1)

准备数据

准备开发环境

开始开发

相关文章