The quality of gener-
ated responses is evaluated with different metrics
from three aspects: relevance, diversity, and nov-
elty.
Human Evaluation.
- Average Score and Best@1
表给出了人类裁判的平均得分和Best@1的比率。第一种是五名法官的平均值;后者计算四个系统中认为对应响应最好的法官的比例。
The quality of gener-
ated responses is evaluated with different metrics
from three aspects: relevance, diversity, and nov-
elty.
本文标题:NLP-对话评估指标
本文链接:https://www.haomeiwen.com/subject/ezudtktx.html
网友评论