将单词小写word.lower()
词干提取
nltk中的词干提取Proter和Lancaster,即使是不规则变形也能够提取.如对lying这个单词的处理.
import nltk
#nltk自带的词干提取
porter = nltk.PorterStemmer()
Lancaster = nltk.LancasterStemmer()
#待处理的单词
words = ["action","automation","lovely","lying"]
#使用porter提取
for word in words:
print(porter.stem(word))
print("-------------------------------------------")
#使用Lancaster提取
for word in words:
print(Lancaster.stem(word))
运行结果
可以说两个工具都有好处吧.
词形归并
词形归并使用nltk的WordNetLemmatizer(),好像功能并不是那么强大.
import nltk
wnl = nltk.WordNetLemmatizer()
words = {"swords","women","men","best"}
for word in words:
print(wnl.lemmatize(word))
词形归并结果
网友评论