美文网首页
2019-10-15对话系统调研篇

2019-10-15对话系统调研篇

作者: 布口袋_天晴了 | 来源:发表于2019-10-15 14:11 被阅读0次

1.高建峰 微软 自然语言处理,信息检索,机器学习,深度学习
h指数

[1]A user simulator for task-completion dialogues | 43 | 2016 |
[2]End-to-end joint learning of natural language understanding and dialogue manager | 39 | 2017 |
[3]Deep reinforcement learning for dialogue generation | 459 | 2016 |
[4]Towards end-to-end reinforcement learning of dialogue agents for information access | 121 | 2016 |
[5]End-to-end task-completion neural dialogue systems| 120 | 2017 |
[6]Efficient exploration for dialogue policy learning with bbq networks & replay buffer spiking | 76 | 2016 |
[7]Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning | 52 | 2017 |

一、A user simulator for task-completion dialogues

摘要:
Despite wide spread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. 
First,reinforcement learners typically require interaction with the environment, so con-ventional dialogue corpora cannot be used directly.  
Second, each task presentsspecific challenges, requiring separate corpus of task-specific annotated data. 
Third,collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge.  
Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues.
Then, one can train reinforcement learning agents in an online fashion as they inter-act with the simulator. 
Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online.To ease empirical algorithmic comparisons in dialogues, this paper introduces anew, publicly available simulation framework, where our simulator, designed forthe movie-booking domain, leverages both rules and collected data. 
The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demon-strate several agents and detail the procedure to add and test your own agent in theproposed framework
用户仿真的面向任务的对话系统
1.虽然强化学习在面向任务的对话系统中被广泛应用着,但该方法也存在一些阻碍。
2.强化学习需要与环境交互,所以对话语料没办法直接使用。
3.每个任务都需要特定的单独的语料库。
4.获取面向任务的“人-人”,“人-机”对话,需要广泛的领域知识。
5.构建合适的数据集,会花费巨大的人力成本和时间成本。
6.一种较为适用的数据收集方式:基于示例对话的语料库构建用户模拟器。
7.网上在线,与人互动时,训练强化代理器。
8. 一旦代理掌握了模拟器,就可以将其部署在真实的环境中与人类互动,并继续进行在线训练。
9.该篇文章主要研究内容:
基于规则的方法--电影票预定/电票搜索
∗源代码位于:https://github.com/MiuLab/UserSimulator

二、End-to-end joint learning of natural language understanding and dialogue manager

摘要:
Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance.
Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. 
To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. 
Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.
1.对话系统:根据用户的会话,预测系统下一个动作。
2.对话系统两个重要组成部分:自然语言理解/对话策略学习。
3.对话系统常规方法,自然语言理解和系统动作预测单独聚合在一起。
4.但,自然语言理解部分常容易出错。
5.该篇文章主要研究内容:
提出具有有限上下文对话记忆的端到端深度循环神经网络,联合自然语言理解和系统动作预测一起训练。
∗源代码位于:https://github.com/XuesongYang/end2end_dialog

三、Deep Reinforcement Learning for Dialogue Generation

摘要:
Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be short-sighted,  predicting  utterances  one  at  a  time while ignoring their influence on future out-comes. 
Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. 
In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chat-bot dialogue. 
The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties:  informativity, coherence, and ease of answering (related to forward-looking function). 
We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. 
This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.
基于强化学习的对话生成
1.单轮对话的缺点,只考虑当前用户会话;多轮对话,要考虑用户之前也说过的话。
2.用户的对话有跳跃性,也有连贯性,对话系统中可以用强化学习,促进了更持久的对话。
3.该篇文章的主要研究内容:
展示了如何整合这些目标,将深度强化学习应用于聊天机器人对话的奖励模型中。 
获取一个能持久性聊天的系统。
持久对话的例子

四、Towards end-to-end reinforcement learning of dialogue agents for information access

摘要:
This paper proposes KB-Info Bot— a multi-turn dialogue agent  which  helps users  search Knowledge Bases (KBs) without  composing  complicated queries.
Such  goal-oriented  dialogue  agents  typically  need  to  interact  with  an  external database to access real-world knowledge.
Previous systems achieved this by issuing a symbolic query to the KB to retrieve entries  based  on  their  attributes.   
However, such symbolic operations break the differentiability of the system and prevent end-to-end training of neural dialogue agents.
In  this  paper,  we  address  this  limitation by replacing symbolic queries with an induced “soft” posterior distribution over the KB that indicates which entities the user is interested in.  
Integrating the soft retrieval process with a reinforcement learner leads to higher task success rate and reward in both  simulations  and  against  real  users.
We also present a fully neural end-to-end agent, trained entirely from user feedback, and discuss its application towards personalized dialogue agents.
1.多轮对话代理,帮助用户搜索知识库KB,而无需写复杂的查询。
2.系统通过向KB发出符号查询,根据其属性检索条目来实现对话返回。
但是,这样的符号操作破坏了系统的可区分性,并阻止了神经对话代理的端到端训练。
3.即,以上查询返回过程是不在,神经网络模型训练过程中的,它是直接借助数据库查询语句了。
4.该篇文章研究内容:
将查询返回这一系类动作与强化学习联系起来
完全根据用户的反馈去训练模型
源代码可在以下位置获得:https://github.com/MiuLab/KB-InfoBot

五、End-to-end task-completion neural dialogue systems

摘要:
One of the major drawbacks of modularized task-completion dialogue systemsis  that  each  module  is  trained  individually,  which  presents  several  challenges.
For example, downstream modules are affected by earlier modules,  and  the  per-formance  of  the  entire  system  is  not  robust to the accumulated errors.   
This paper presents a novel end-to-end learning framework for  task-completion dialogue systems to tackle such issues.   
Our neural  dialogue  system  can  directly  interact with a structured database to assist users in accessing information and accomplishing certain tasks. 
The reinforcement learning based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system.
Our  experiments  in  a  movie-ticket  booking domain show that our end-to-end system not only outperforms modularized dialogue system baselines for both objective and  subjective  evaluation,  but  also  is  robust to noises as demonstrated by several systematic  experiments  with  different  error granularity and rates specific to the language understanding module.
1.面向任务的对话模块,针对每个任务是单独训练一个模型,这对对话系统有一定影响。
2.不良影响,如:前一个模块的错误会累计到后一个模块中。
该篇文章的研究内容:
在电影票预定领域进行实验
设计基于强化学习管理的对话系统

六、Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking

摘要:
When rewards are sparse and action spaces large, Q-learning with greedy exploration can be inefficient. 
This poses problems for otherwise promising applications such as task-oriented dialogue systems, where the primary reward signal, indicating successful completion of a task, requires a complex sequence of appropriate actions. 
Under these circumstances, a randomly exploring agent might never stumble upon a successful out come in reasonable time.  
We present two techniques that significantly improve the efficiency of exploration for deep Q-learning agents indialogue systems. 
First, we introduce an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as greedy and Boltzmann exploration.  
Second, we show that spiking the replay buffer with experiences from a small number of successful episodes, as are easy to harvest for dialogue tasks, can make Q-learning feasible when it might otherwise fail.
1.当奖励稀少且行动空间很大时,进行贪婪探索的Q学习可能会效率低下。
2.任务成功完成的主要奖励信号需要一系列复杂的适当动作。
3.在这种情况下,随机探索的决策代理可能永远不会在合理的时间内做出成功的回复。
4.该篇文章的研究内容:
提高Q学习代理程序对话系统的探索效率

七、Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning

摘要:
Building  a  dialogue  agent  to  fulfill  complex tasks, such as travel planning, is challenging  because  the  agent  has  to  learn to collectively complete multiple subtasks.
For example, the agent needs to reserve a hotel and book a flight so that there leaves enough time for commute between arrival and hotel check-in.  
This paper addresses this  challenge  by  formulating  the  task  in the  mathematical  framework  of options over Markov Decision Processes (MDPs), and  proposing  a  hierarchical  deep  reinforcement learning approach to learning a dialogue  manager  that  operates  at  different  temporal  scales.   
The  dialogue  manager consists of:  (1) a top-level dialogue policy that selects among subtasks or op-tions, (2) a low-level dialogue policy that selects  primitive  actions  to  complete  the subtask given by the top-level policy, and (3) a global state tracker that helps ensure all  cross-subtask  constraints  be satisfied.
Experiments on a travel planning task with simulated and real users show that our approach leads to significant improvements over three baselines,  two based on hand-crafted  rules  and  the  other  based  on  flat deep reinforcement learning.
该篇文章的研究内容:
针对旅行对话系统进行优化
提出 分层的深度强化学习方法来学习在不同时间范围内运行对话管理器

八、总结

限定范围:电影查询对话/电影票预定对话/旅游出行对话 都是解决某一特定领域,某一特定问题。 对话系统--目标--智能服务人类的生活,对话系统中待解决的问题太多了。 强化学习是对话系统发展的方向。

相关文章

网友评论

      本文标题:2019-10-15对话系统调研篇

      本文链接:https://www.haomeiwen.com/subject/utcsmctx.html