检测科学摘要中特定的词或词组 (自学43天)

作者: 天明豆豆 | 来源:发表于2020-03-24 17:37 被阅读0次

检测科学摘要中特定的词或词组 (自学43天)
多模式匹配AC算法Java（kotlin）实现，可建模中文
词组
Android自定义View 词组高亮控件
04｜连词不是你的朋友，向“因为、所以、虽然、但是”say go
轧制的大盘卷的线径在线检测
学会提问第三章
自学“现代汉语语法”笔记—词组
第二节词语规范（三）
《短板》44

检测科学摘要中特定的词或词组

可以使用上一篇文章所用到的检测科学摘要中的词或词组。一般地，本例还可以适用于进行非常简单的文本挖掘，可类比于 Microsoft Word 的"查找"工具。

import urllib2 
import re 
# word to be searched 

keyword = re.compile('schistosoma')

# list of PMIDs where we want to search the word 

pmids = ['18235848','22607149','22405002','21630672'] 
for pmid in pmids: 
  url = 'http://www.ncbi.nlm.nih.gov/pubmed?term=%s' +%pmid 
  handler=urllib2.urlopen(url)
  html = handler.read() 
  title_regexp = re.compile('<h1>.{5.400}<!h1>') 
  title=title_regexp.search(html) 
  title=title.group() 
  abstract_regexp = re.compile('<h3>Abstract</h3><p>.{20.3000}</p></div>') 
  abstract = abstract_regexp.search(html) 
  abstract = abstract.group() 
  word = keyword.search(abstract,re.IGNORECASE) 

if word: 
# display title and where the keyword was found 
  print (title) 
  print (word.group(),word.start(),word.end())

如果想找出文本单词的所有匹配结果，可以使用 finditer()方法:

import urllib2
import re 
# word to be searched 

word_regexp = re.compile('schistosαna')
# list of PMIDs where we want to search the word 

pmids = ['18235648','22607149','22405002','21630672'] 
for pmid in pmids: 
  url = 'http://www.ncbi.nlm.nih.gov/pubmed?term=%s' +%pmid 
  handler = urllib2.urlopen(url) 
  html = handler.read () 
  title_regexp = re.compile('<h1>.{5,400}</h1>') 
  title = title_regexp.search(html) 
  title = title.group() 
  abstract_regexp = re.compile('<h3>Abstract</h3><P>.{20, 3000}</p></div>') 
  abstract = abstract_regexp.search(html) 
  abstract = abstract.group() 
  words = keyword.finditer(abstract) 
  if words: 
# diaplay title and where the keyword was found 

    print (title)
    for word in words: 
      print (word.group(),word.start(),word.end())

网友评论

本文标题：检测科学摘要中特定的词或词组 (自学43天)

本文链接：https://www.haomeiwen.com/subject/iscayhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

检测科学摘要中特定的词或词组 (自学43天)

相关文章

检测科学摘要中特定的词或词组 (自学43天)

多模式匹配AC算法Java（kotlin）实现，可建模中文

词组

Android自定义View 词组高亮控件

04｜连词不是你的朋友，向“因为、所以、虽然、但是”say go

轧制的大盘卷的线径在线检测

学会提问第三章

自学“现代汉语语法”笔记—词组

第二节词语规范（三）

《短板》44

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读