使用Biopython浏览Nucleotide数据库及下载fas

作者: javaLi | 来源:发表于2019-03-30 11:20 被阅读27次

使用Biopython浏览Nucleotide数据库及下载fas
project 5 movies question
下载ENA数据库文件
Biopython Tutorial and Cookbook.
常用的Java工具类
biopython使用
2020-12-16 seqkit分割phylosuite下载的
urlretrieve下载卡死无法超时的问题
计算叶绿体基因组LSC/IR/SSC的GC含量
Centos安装InfluxDB

浏览nucleotide数据库

1.使用Biopython模块搜索文献

from Bio import Entrez,SeqIO,Medline  #
Entrez.email=""   输入Email
handle=Entrez.esearch(db="nucleotide",term='CRT[Gene Name] AND "Plasmodium falciparum"[Organism]')#搜索核苷酸数据库，限定条件：基因名，物种
rec_list=Entrez.read(handle) #读取搜索到的文献ID，默认20条

找到的文献id，条数等。

rec_list结果

2.查看第一篇文章

if rec_list['RetMax']<rec_list['Count']:    #修改为查看所有找到的文献
    handle = Entrez.esearch(db="nucleotide", term='CRT[Gene Name] AND "Plasmodium falciparum"[Organism]',retmax=rec_list['Count'])
   rec_list=Entrez.read(handle)
id_list=rec_list['IdList']
hdl=Entrez.efetch(db='nucleotide',id=id_list,rettype='gb')#下载得到的文献内容
recs=list(SeqIO.parse(hdl,'gb'))#解析内容
rec=recs[0]#查看找到的第一篇文献

第一篇文献内容

3.在NCBI找到第一篇文献，与其对照

NCBI文献

检查文章title是否一致

rec.description #文章标题

文章标题

3.去PubMed查看该文章信息

refs=rec.annotations['references']
for ref in refs:
    if ref.pubmed_id!="":#找到含有PubMed ID的文章，共找到一篇，与NCBI相符
        handl=Entrez.efetch(db='pubmed',id=[ref.pubmed_id],rettype='medline',retmode='text')
        records=Medline.parse(handl)
        for med_rec in records:
            print(med_rec)

PubMed内容

可以看到PubMed的标题，作者，摘要等。

下载fasta格式的序列

将第一篇文章的fasta下载下来

handle=Entrez.efetch(db='nucleotide',id=rec.id,rettype='fasta')
seq=SeqIO.read(handle,'fasta')
with open('example.fasta','w') as f:
    SeqIO.write(seq,f,'fasta')

结果如下：

下载的fasta

转录，翻译fasta文件

1.寻找CDS位点
上面下载的fasta文件序列包含了外显子等编码区及非编码区，在转录中我们只需要编码区。因此我们需要找到CDS的位点。

 rec.features #查看位点信息

可以看到碱基数都为231，不存在非编码区域

没有外显子等其他非编码区域

for feature in rec.features:
    if feature.type=='CDS':
        print(feature.location)

CDS位点

2.读取文件

fasta=SeqIO.parse('example.fasta','fasta')
for fa in fasta:
    seq=fa.seq
    print(fa.description)
    print(seq.alphabet)

读取文件

3.转录，翻译

seq=Seq.Seq(str(seq),IUPAC.unambiguous_dna)#将seq格式转为dna格式
seq.transcribe()#转录
seq.translate()#翻译

转录，翻译

网友评论

python

本文标题：使用Biopython浏览Nucleotide数据库及下载fas

本文链接：https://www.haomeiwen.com/subject/zskjbqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

使用Biopython浏览Nucleotide数据库及下载fas

浏览nucleotide数据库

下载fasta格式的序列

转录，翻译fasta文件

相关文章

使用Biopython浏览Nucleotide数据库及下载fas

project 5 movies question

下载ENA数据库文件

Biopython Tutorial and Cookbook.

常用的Java工具类

biopython使用

2020-12-16 seqkit分割phylosuite下载的

urlretrieve下载卡死无法超时的问题

计算叶绿体基因组LSC/IR/SSC的GC含量

Centos安装InfluxDB

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

python