![](https://img.haomeiwen.com/i12371088/4830d9d7316f2bea.png)
The quality of gener-
ated responses is evaluated with different metrics
from three aspects: relevance, diversity, and nov-
elty.
Human Evaluation.
- Average Score and Best@1
表给出了人类裁判的平均得分和Best@1的比率。第一种是五名法官的平均值;后者计算四个系统中认为对应响应最好的法官的比例。
ACL 2019Target-Guided Open-Domain Conversation
![](https://img.haomeiwen.com/i12371088/49d62296788a13ec.png)
网友评论