Entity Linking 文章思路整理

作者: BoringFantasy | 来源:发表于2020-10-21 22:43 被阅读0次

Entity Linking 文章思路整理
entity linking相关文章
nlp progress entity linking
实体链接(Entity-Linking)
关于Entity Linking（实体链接）的材料收集
chapter 7
react-native WebView里的url使用浏览器打开
写文章是为了整理思路
日更30天，有些遗憾
思路整理

Entity Linking，或者Entity Normalization，Concept Linking等，通常指将自然语言中，通常为科学文献的各类概念，实体提及，对应到Ontology或是数据库，例如wiki中的唯一ID。通过对Entity Linking近年来方法的阅读，主要总结为一下几步，1. 对Token进行表示，通常利用work2vec，glove等模型，也有利用Bert，ELMO等基于上下文调整token词嵌入模型的。2. 利用Token嵌入加权求和，Attention，卷积等方法得到Entity及Mention的嵌入。3. 利用余弦相似度等嵌入相似度评价指标，或神经网络等方法，得到与每个Mention最相似的Entity嵌入排序。4. 部分文章包含，使用外部信息，例如知识图谱提供的Entity嵌入，或者其他上下文嵌入，Entity的description等嵌入来进行Re-Rank，加强每个Mention最相似的Entity排序。这一问题同知识检索系统系统所解决的Query对应Key的问题大相径庭，下面将整理近期读过的关于Entity Linking的文章思路进行总结汇总，仅供项目推进。

关键贡献点及参考点用加粗注释。

一. Efficient One-Pass End-to-End Entity Linking for Questions. EMNLP 2020.

提出了一个Mention detection 和 Entity Linking 结合的Bert based bi-encoder model（ELQ Model）.
Aims to:
2.1 identify the mention boundaries of entities in a given question.
2.2 and their Wikipedia entity。
Methods step:
3.1 entity encoder利用Wikipedia中每个实体的描述构建每个实体的嵌入，（考虑利用Concept的description或Ontology构建实体嵌入）。
3.2 question encoder为输入的问题构建 token-level的嵌入。
3.3 利用question encoder 中每个token的嵌入决定Mention boundaries，并且每个Mention candidate的嵌入为包含的token嵌入的平均嵌入。
3.4 利用entity和mention嵌入的内积进行实体连接。

Biencoder

ELQ Model.
4.1 Question嵌入为 $[q1...qn]$ 通过Bert后的嵌入。Entity嵌入通过Wikipedia中该实体对应网页的Title和description初始化。
ELQ Model
4.2 分别用 $W_{start}^T$ 和 $W_{end}^T$ 表示Mention起始及终止位置的状态嵌入。 $W^T_{mention}$ 表示Mention所包含单词的权重。则p([i,j])表示该Mention出现的概率，其中 $i,j$ 表示在Question中的第 $i$ 个单词和第 $j$ 个单词作为Mention的起始位置和终止位置。 Mention Detection
4.3 $s(e,[i,j]))$ 计算Question嵌入 $x_e$ 和Mention嵌入 $y_{i,j}$ 的相似度， $p(e|[i,j])$ 对分数进行softmax操作，利用神经网络最优化Mention及Entity打分函数 $s(e,[i,j])$ 。 Entity Disambiguation
4.4 构建两个损失函数， $L_{MD}$ 为衡量所有Mention出现的概率损失的binary cross entropy loss， $L_{ED}$ 衡量Entity Liking的损失，总损失为链各个损失之和。
Trick
5.1 实际计算中，因为Wikipedia中Entity数量太多，Softmax不好计算，所以设计参数 $\gamma$ 对 $p([i,j])$ 进行限制，再计算每个保留下来的Mention10个最近的Entity计算softmax。

参考：
6.1 利用Wikipedia对每个Entity的title和description来初始化Entity嵌入。作为参考，我们同样可以将Concept的描述，或者其他外部信息，例如定义等来强化Entity嵌入，或者将Ontology的树状结构加入嵌入信息。或利用包含Mention的句子来加强Mention嵌入，但是需要确保句子质量，特别是利用Bert，ELMO这类模型时。
6.2 设计打分函数利用神经网络学习Mention及Entity的Linking。
6.3 最后Python package的输出可以是Mention最相似的TopN的Entity。