美文网首页
构建词汇表方法

构建词汇表方法

作者: 思君颜如玉 | 来源:发表于2018-08-01 20:10 被阅读0次

all_words = []
for item in train['word_seg']:
for word in item.split():
all_words.append(word)

from collections import Counter
voc_info = Counter(all_words)
voc = [item[0] for item in voc_info.most_common()[:40000]]
voc_index = {'unk':0}
voc_index.update(zip(voc,[item+1 for item in range(len(voc))]))

相关文章

网友评论

      本文标题:构建词汇表方法

      本文链接:https://www.haomeiwen.com/subject/qsarvftx.html