美文网首页
NER模型在新数据上的处理和准确率计算

NER模型在新数据上的处理和准确率计算

作者: 陶_306c | 来源:发表于2021-04-15 15:58 被阅读0次

1、对新数据进行处理

# -*- coding: utf-8 -*-

import os
import nltk

dir = "/Users/Shared/CRF_4_NER/CRF_TEST"

sentence = "Venezuelan opposition leader and self-proclaimed interim president Juan Guaidó said Thursday he will return to his country by Monday, and that a dialogue with President Nicolas Maduro won't be possible without discussing elections."
#sentence = "Real Madrid's season on the brink after 3-0 Barcelona defeat"
# sentence = "British artist David Hockney is known as a voracious smoker, but the habit got him into a scrape in Amsterdam on Wednesday."
# sentence = "India is waiting for the release of an pilot who has been in Pakistani custody since he was shot down over Kashmir on Wednesday, a goodwill gesture which could defuse the gravest crisis in the disputed border region in years."
# sentence = "Instead, President Donald Trump's second meeting with North Korean despot Kim Jong Un ended in a most uncharacteristic fashion for a showman commander in chief: fizzle."
# sentence = "And in a press conference at the Civic Leadership Academy in Queens, de Blasio said the program is already working."
#sentence = "The United States is a founding member of the United Nations, World Bank, International Monetary Fund."

default_wt = nltk.word_tokenize # 分词
words = default_wt(sentence)
print(words)
postags = nltk.pos_tag(words)
print(postags)

with open("%s/NER_predict.data" % dir, 'w', encoding='utf-8') as f:
    for item in postags:
        f.write(item[0]+' '+item[1]+' O\n')

print("write successfully!")

对新数据进行命名实体识别,看看模型在新数据上的识别效果。使用模型对处理后的新数据进行预测,得到预测结果predict.txt。

# 读取预测文件redict.txt
with open("/predict.txt" , 'r', encoding='utf-8') as f:
    sents = [line.strip() for line in f.readlines() if line.strip()]

word = []
predict = []

for sent in sents:
    words = sent.split()
    word.append(words[0])
    predict.append(words[-1])

# print(word)
# print(predict)

# 去掉NER标注为O的元素
ner_reg_list = []
for word, tag in zip(word, predict):
    if tag != 'O':
        ner_reg_list.append((word, tag))

# 输出模型的NER识别结果
print("NER识别结果:")
if ner_reg_list:
    for i, item in enumerate(ner_reg_list):
        if item[1].startswith('B'):
            end = i+1
            while end <= len(ner_reg_list)-1 and ner_reg_list[end][1].startswith('I'):
                end += 1

            ner_type = item[1].split('-')[1]
            ner_type_dict = {'PER': 'PERSON: ',
                             'LOC': 'LOCATION: ',
                             'ORG': 'ORGANIZATION: ',
                             'MISC': 'MISC: '
                            }
            print(ner_type_dict[ner_type], ' '.join([item[0] for item in ner_reg_list[i:end]]))

2、Python脚本统计预测的准确率

# -*- coding: utf-8 -*-


with open("/result.txt" , "r") as f:
    sents = [line.strip() for line in f.readlines() if line.strip()]

total = len(sents)
print(total)

count = 0
for sent in sents:
    words = sent.split()
    if words[-1] == words[-2]:
        count += 1

print("Accuracy: %.4f" %(count/total))

相关文章

  • NER模型在新数据上的处理和准确率计算

    1、对新数据进行处理 对新数据进行命名实体识别,看看模型在新数据上的识别效果。使用模型对处理后的新数据进行预测,得...

  • 80-预测分析-R语言实现-神经网络和有序逻辑回归

    1、读入数据 2、 数据预处理 3、神经网络建模 模型在训练集上的准确率为80.46%,同时可以看到,模型对1和2...

  • TF笔记 - 正则化

    √过拟合:神经网络模型在训练数据集上的准确率较高,在新的数据进行预测或分类时准确率较低,说明模型的泛化能力差。 √...

  • 特征筛选工具

    做模型时常常是特征越多模型准确率越高(至少在训练集上)。但过多的特征又增加了数据收集、处理、存储的工作量,以及模型...

  • sklearn常用函数

    数据标准化 数据分割函数 模型训练、预测、计算准确率 计算精确率、召回率、F1分 Pipelines

  • 初次见面RDD

    面对大数据量的计算,我们面对的是要处理的数据和处理数据的程序,那么Hadoop的MapReduce编程模型和计算框...

  • 绪论

    图灵模型 程序:用来告诉计算机对数据处理的指令集合。在图灵模型中,输出数据依赖于输入数据和程序。 冯·诺依曼模型 ...

  • 自己理解的各大数据库应用场景,不一定对

    Mapreduice:理解:大数据并行处理的计算模型。用途:为保证数据处理速度,大数据处理部分用这个模型去做。 E...

  • Metal - 并行计算(四)

    在GPU上并行计算处理任意计算。 在计算函数中处理纹理 通过将数据置于纹理中,对结构化数据执行并行计算 创建线程和...

  • 猴子都能懂的NLP(NER)

    创建一个简单的模型理解句子某些词的语义(NER) 加载一些包 加载标签和语句 在ner文件夹里面有一堆原始数据,每...

网友评论

      本文标题:NER模型在新数据上的处理和准确率计算

      本文链接:https://www.haomeiwen.com/subject/qqfdlltx.html