美文网首页生信在线工具
mVISTA:在线程序展示叶绿体基因组相似性小实例

mVISTA:在线程序展示叶绿体基因组相似性小实例

作者: 小明的数据分析笔记本 | 来源:发表于2019-03-17 17:14 被阅读52次

    叶绿体基因组类的文章通常会有一幅图来展示叶绿体基因组的相似性(Sequence identity plot),出图的工具是mVISTA:mVISTA分为本地版和在线版两种。本文简要介绍使用在线版mVISTA获得Sequence identity plot的步骤

    本文使用到的序列数据为5个苹果属植物的叶绿体基因组序列

    Species accession numbers
    Malus florentina NC_035625
    Malus micromalus NC_036368
    Malus prunifolia NC_031163
    Malus trilobata NC_035671
    Malus tschonoskii NC_035672
    第一步:下载序列

    下载每个叶绿体基因组的fasta格式;下载作为参考基因组的genbank格式文件。(序列不是很多可以手动下载;如果序列条数非常多自己写个一个简单的python脚本批量下载叶绿体基因组序列)

    import os
    import sys
    import argparse
    parser = argparse.ArgumentParser(description='This script was used to download gb or fasta file of cp genome from NCBI nucleotides database')
    parser.add_argument('-f','--format',help='Please input file format you want to download',required=True)
    parser.add_argument('-a','--accession',help='file name contain accession number of cp genome you want to download',required=True)
    args = parser.parse_args()
    
    
    fr = open(args.accession,'r')
    acc = []
    for line in fr:
        acc.append(line.strip())
    print("The number of sequence will be downloaded is: ",len(acc))
    
    from Bio import Entrez
    from Bio import SeqIO
    Entrez.email = "" #在这里添加邮箱
    
    fold_name = "downladed_seq_"+args.format
    os.mkdir(fold_name)
    os.chdir(fold_name)
    
    for line in acc:
        print(line + " will be downloaded!")
        fw = open(line+"."+args.format,'w')
        cp = Entrez.efetch(db='nucleotide',id=[line],rettype=args.format)
        seq = SeqIO.read(cp,args.format)
        SeqIO.write(seq,fw,args.format)
        fw.close()
        print(line + " is downloaded!")
    
    print(len(os.listdir()),"sequence have been downloaded")
    

    使用方式

    #下载fasta格式
    python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f fasta -a accession_numbers.txt
    # 下载genbank格式
    python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f gb -a accession_numbers.txt
    # 序列号放到文本文件中每行一个
    
    第二步:上传序列至mVISTA

    http://genome.lbl.gov/vista/mvista/submit.shtml

    image.png
    本例中 total number of seuqences is 5 image.png
    • 填写邮箱地址,运行完结果发送至邮箱;上传下载好的fasta格式序列
    • 选择比对程序 论文中通常使用第三个
      image.png
    • 上传注释文件
      mVISTA要求的注释文件格式为 http://genome.lbl.gov/vista/mvista/instructions.shtml
      image.png
      可以通过处理genbank格式文件得到
    import argparse
    
    from Bio import SeqIO
    
    parser = argparse.ArgumentParser(description = 'This script was used to get mVISTA annotation file used to cp genome comparative analysis')
    parser.add_argument('-i','--genbank',help="Please input genBank format file", required = True)
    args = parser.parse_args()
    
    fw = open(args.genbank + "_mVISTA_annotation","w")
    
    for rec in SeqIO.parse(args.genbank,"gb"):
        for feature in rec.features:
            if feature.type == "gene":
                for part in feature.location.parts:
                    if part.strand == -1:
                        start_location = str(part.start)
                        end_location = str(part.end)
                        gene_name = feature.qualifiers["gene"][0]
                        fw.write("<\t%s\t%s\t%s\n"%(start_location,end_location,gene_name))
                    else:
                        start_location = str(part.start)
                        end_location = str(part.end)
                        gene_name = feature.qualifiers["gene"][0]
                        fw.write("<\t%s\t%s\t%s\n"%(start_location,end_location,gene_name))
            elif feature.type == "CDS":
                for part in feature.location.parts:
                    if part.strand == -1:
                        start_location = str(part.start)
                        end_location = str(part.end)
                        fw.write("%s\t%s\t%s\n"%(start_location,end_location,"exon"))
                    else:
                        start_location = str(part.start)
                        end_location = str(part.end)
                        fw.write("%s\t%s\t%s\n"%(start_location,end_location,"exon"))
            elif feature.type == "tRNA" or feature.type == "rRNA":
                for part in feature.location.parts:
                    start_location = str(feature.location.start)
                    end_location = str(feature.location.end)
                    fw.write("%s\t%s\t%s\n"%(start_location,end_location,"utr"))
            else:
                print("%s %s"%(feature.type,"is not needed"))
                
    print("The process is done!")
    
    fw.close()
    
    # 使用方法
    python get_mVISTA_annotation_file_from_genbank_1.py -i NC_031163.gb         
    
    image.png

    这里只上传参考序列的注释文件和全部上传注释文件是否有区别自己还没有搞清楚,这里暂时只选择上传参考序列的注释文件。其他参数选择默认,具体参数是什么功能自己还需要仔细查看帮助文档。


    image.png

    结果以邮件的形式发送邮箱


    image.png
    选择输出的长度 点击此处下载结果
    image.png
    输出结果为pdf文件自己调节细节

    欢迎大家关注我的公众号 小明的数据分析笔记本

    公众号二维码.jpg

    相关文章

      网友评论

        本文标题:mVISTA:在线程序展示叶绿体基因组相似性小实例

        本文链接:https://www.haomeiwen.com/subject/qevymqtx.html