论文链接:http://pdfs.semanticscholar.org/893a/9ea38da739af53d4cb8ec5d0e722b0e6c6e4.pdf
本文的任务:extract aspect and opinion terms/phrases for each review
In aspect-based sentiment analysis, the core component is to extract aspects or features of a product/service from a review, along with the opinions being expressed. aspect-based sentiment analysis任务是做什么
过去的方法主要分为两类:
第一种:从一个seed集合,使用句法规则和aspect及opinion之间的关联来积累aspect terms和opinion terms。但是这种方法很依赖与手动定义的规则,并且严格遵循特定的词性规则,例如opinion词是形容词。
第二种:sequence labeling classifier,例如CRFs和HMMs,使用feature engineering,词典和有标注的数据集。This approach requires extensive efforts for designing hand-crafted features, and only combines features linearly when a CRF/HMM is applied
使用深度学习进行情感分析的方法分为两类:一类是句子级别的情感预测,一类是phrase/word-level情感预测。
本文的方法:
包括两部分:基于每句话的依存树构造的recursive neural network,为了学习到句子中每个词的上下文的high-level representation,输出会送到Conditional random field(CRF)学习从high-level特征到标签的映射。Because CRFs have proven to be promising for this kind of sequence tagging problems.
与本文方法类似的是:【1】,使用标准的recurrent neural network,很依赖于word embeddings的质量,除此之外,没有考虑句子结构中的依存关系。
标注类别The tree structure used for RNNs generally adopts two forms: constituency tree and dependency tree。在constituency tree中,所有的词都在叶节点中,每个内部节点表示一个短语或句子的一部分,跟节点代表整个句子;dependency树,每个节点代表一个词,用依存关系与其他节点关联。
每个节点n,是与一个词w相关联的,每个依存关系r与一个矩阵关联
依存关系,在训练时学习 transformation矩阵,将word embedding xw与一个隐藏向量hn关联先计算叶子结点的隐状态:
叶节点隐藏状态 内部节点的隐状态 隐状态计算公式CRF:
In a linear-chain CRF, which is empolyed in this paper, there are two different cliques: unary clique(U) representing input-output connection, pairwise clique(P) representing adjacent output connection.
【1】Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings,论文链接:paper
网友评论