1.在循环过程中删除了列表中的元素,导致列表长度变短,索引出错。
english_punctuations = [',','.',':',';','?','(',')','[',']','&','!','*','@','#','$','%','...']
sentence='''@justinbieber Thank u Justin for this amazing'''
line=nltk.word_tokenize(sentence)
foriinrange(len(line)):
ifline[i]=='@':
line[i+1]='name'
ifline[i]=='%':
line[i]='percentage'
ifline[i].isdigit():
line[i]='number'
ifline[i]inenglish_punctuations:
line.pop(i)
iflen(line[i])<2:
line.pop(i)
line[i]=line[i].lower()
网友评论