美文网首页
NER----NER数据集以及SOTA模型

NER----NER数据集以及SOTA模型

作者: 陶_306c | 来源:发表于2021-04-14 17:00 被阅读0次

    CoNLL2003

    Model F1 Paper / Source Code
    LUKE (Yamada et al., 2020) 94.3 LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention Official
    CNN Large + fine-tune (Baevski et al., 2019) 93.5 Cloze-driven Pretraining of Self-attention Networks
    RNN-CRF+Flair 93.47 Improved Differentiable Architecture Search for Language Modeling and Named Entity Recognition
    CrossWeigh + Flair (Wang et al., 2019) 93.43 CrossWeigh: Training Named Entity Tagger from Imperfect Annotations Official
    LSTM-CRF+ELMo+BERT+Flair 93.38 Neural Architectures for Nested NER through Linearization Official
    Flair embeddings (Akbik et al., 2018) 93.09 Contextual String Embeddings for Sequence Labeling Flair framework
    BERT Large (Devlin et al., 2018) 92.8 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    CVT + Multi-Task (Clark et al., 2018) 92.61 Semi-Supervised Sequence Modeling with Cross-View Training Official
    BERT Base (Devlin et al., 2018) 92.4 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    CoNLL++

    This is a cleaner version of the CoNLL 2003 NER task, where about 5% of instances in the test set are corrected due to mislabelling. The training set is left untouched. Models are evaluated based on span-based F1 on the test set.

    Model F1 Paper / Source Code
    CrossWeigh + Flair (Wang et al., 2019) 94.28 CrossWeigh: Training Named Entity Tagger from Imperfect Annotations Official
    Flair embeddings (Akbik et al., 2018) 93.89 Contextual String Embeddings for Sequence Labeling Flair framework
    BiLSTM-CRF+ELMo (Peters et al., 2018) 93.42 Deep contextualized word representations AllenNLP Project AllenNLP GitHub
    Ma and Hovy (2016) 91.87 End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
    LSTM-CRF (Lample et al., 2016) 91.47 Neural Architectures for Named Entity Recognition

    Ontonotes v5 (English)

    OntoNotes 5.0由 1745k 英语、900k 中文和300k 阿拉伯语文本数据组成,OntoNotes 5.0的数据来源也多种多样, 有电话对话、新闻通讯社、广播新闻、广播对话和博客。实体被标注为【PERSON】、【ORGANIZATION】和【LOCATION】等18个类别。
    Makeup 是 OntoNotes 使用的标注方法, 思路比较简单, XML, 比如:

    <ENAMEX TYPE=”ORG”>Disney</ENAMEX> is a global brand .
    它用标签把 命名实体框出来, 然后,在 TYPE 上, 设置相应的类型。

    The Ontonotes corpus v5 is a richly annotated corpus with several layers of annotation, including named entities, coreference, part of speech, word sense, propositions, and syntactic parse trees. These annotations are over a large number of tokens, a broad cross-section of domains, and 3 languages (English, Arabic, and Chinese). The NER dataset (of interest here) includes 18 tags, consisting of 11 types (PERSON, ORGANIZATION, etc) and 7 values (DATE, PERCENT, etc), and contains 2 million tokens. The common datasplit used in NER is defined in Pradhan et al 2013 and can be found here.

    Model F1 Paper / Source Code
    Flair embeddings (Akbik et al., 2018) 89.71 Contextual String Embeddings for Sequence Labeling Official
    CVT + Multi-Task (Clark et al., 2018) 88.81 Semi-Supervised Sequence Modeling with Cross-View Training Official
    Bi-LSTM-CRF + Lexical Features (Ghaddar and Langlais 2018) 87.95 Robust Lexical Features for Improved Neural Network Named-Entity Recognition Official
    BiLSTM-CRF (Strubell et al, 2017) 86.99 Fast and Accurate Entity Recognition with Iterated Dilated Convolutions Official
    Iterated Dilated CNN (Strubell et al, 2017) 86.84 Fast and Accurate Entity Recognition with Iterated Dilated Convolutions Official
    Chiu and Nichols (2016) 86.28 Named entity recognition with bidirectional LSTM-CNNs
    Joint Model (Durrett and Klein 2014) 84.04 A Joint Model for Entity Analysis: Coreference, Typing, and Linking
    Averaged Perceptron (Ratinov and Roth 2009) 83.45 Design Challenges and Misconceptions in Named Entity Recognition (These scores reported in (Durrett and Klein 2014)) Official

    转载自:https://github.com/sebastianruder/NLP-progress/blob/master/english/named_entity_recognition.md

    相关文章

      网友评论

          本文标题:NER----NER数据集以及SOTA模型

          本文链接:https://www.haomeiwen.com/subject/jfxllltx.html