《Aspect Term Extraction for Sent

作者: best___me | 来源:发表于2017-09-24 15:06 被阅读0次

New Datasets, New Evaluation Measures and an Improved Unsupervised Method

ATE：aspect term extraction

ABSA：aspect based sentiment analysis

文章假设搜索引擎获取到用户对某件实体(a particular target entity)的评价

ABAS系统主要包括三个子任务:

1) Aspect term extraction 2）Aspect term sentiment estimation分类 3) Aspect aggregation

文章主要关注点在：aspect term extraction(ATE)

文章的contribution：1) 过去的数据集存在的问题：来自某特定领域或者是很少target entities的评价或者不包含aspect term的注释，所以文章提供三个新的数据集（restaurants, laptops, hotels），并且有gold annotations of all aspect term occurrences，measured inter-annotator agreement注释间的一致性 2) 普遍使用的evaluation measures不是所有都是satisfactory的，例如，经常使用的precision, recall, 和 F-measure 通过计算aspect terms的距离，频繁出现的aspect term和不频繁出现的是equal weight的，然而经常讨论的aspect terms应该是更重要的。文章提出了权重不同的precision和recall 3）方法

查了一下inter-annotator agreement，链接是：https://corpuslinguisticmethods.wordpress.com/2014/01/15/what-is-inter-annotator-agreement/

Inter-annotator agreement is a measure of how well two (or more) annotators can make the same annotation decision for a certain category.

Aspect term extraction methods:

1) baseline: dubbed FREQ，返回最频繁的不同的名词和名词短语 2) Hu and Liu的方法：给baseline增加pruning mechanisms(剪枝机制)，发现更多的aspect terms （dubbed H&L）3）对H&L方法的扩展，增加了pruning step（dubbed H&L+w2v）4）类似的（dubbed FREQ+w2v）

所有方法都是unsupervised

FREQ baseline：返回频率最高的名词和名词短语，并排序

H&L的方法：首先提取不同的名词和名词短语，作为aspect term的备选。然后通过连接成对或三个同时出现在一个句子中的aspect terms生成更长的candidate aspect terms。所有aspect term按照decreasing p-support排序，p-support是包含apect term句子的个数，除去某个含有子term的，例如aspect term有“battery life”和"battery"，那么在句子"The battery life was good"计算在"battery life"的p-support，而不计算在"batter"的p-support中。通过剪枝进行自纠正，首先抛弃"non-conpact"的multi-word distinct aspect terms，例如"battery life screen" appears in non-compact form in "battery life is way better than screen"；然后，如果某个candidate distinct aspect term t的p-support比3小，t is subsumed（包括） by another candidate distinct aspect term t撇，那么t删掉。然后，一组"opinion of adjectives"被组成，对每个句子和每个candidate distinct aspect term t 出现在句子中的，句子中距离t最近的adjective增加到一组opinion adjectives中，然后句子被重新扫描，如果句子中不包含任何candidate aspect term但是包括一个opinion adjective，然后最接近opinion adjective的名词添加到candidate distinct aspect terms。

H&L+W2V：输入变成continuous vector space representations of words，使用神经网络，剪枝步骤使用最频繁的十个candidate distinct aspect terms，然后计算每个向量的centroid，称为domain centroid；相似的，计算the Brown Corpus(news category)中的最频繁的20个词，除去停用词和短于3个字符的词，称为common language centroid。任意candidate distinct aspect term的vector的距离更接近common language centroid而不是the domain centroid，会被删除。接近common language centroid的是common words，而接近domain centroids 的是domain-specific concepts，更有可能是aspect terms。

FREQ+W2V：增加了pruning step，同H&L+W2V，距离。

实验结果：

网友评论

本文标题：《Aspect Term Extraction for Sent

本文链接：https://www.haomeiwen.com/subject/umjqextx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

《Aspect Term Extraction for Sent

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读