mVISTA:在线程序展示叶绿体基因组相似性小实例

作者: 小明的数据分析笔记本 | 来源:发表于2019-03-17 17:14 被阅读52次

mVISTA:在线程序展示叶绿体基因组相似性小实例
在线作图|小基因组——叶绿体基因组圈图
mVISTA 多序列比对叶绿体基因组
利用OGDRAW绘制叶绿体圈图＋Geneious长度的统计
python脚本提取叶绿体基因组的大小单拷贝区、反向重复区
NCBI|线粒体基因组数据上传
MISA(在线)注释叶绿体基因组SSR
ggplot2画饼图、柱状图和折线图
circular barplot(环状柱形图)
提取质体基因组中指定基因区域的几种思路

叶绿体基因组类的文章通常会有一幅图来展示叶绿体基因组的相似性（Sequence identity plot），出图的工具是mVISTA：mVISTA分为本地版和在线版两种。本文简要介绍使用在线版mVISTA获得Sequence identity plot的步骤。

本文使用到的序列数据为5个苹果属植物的叶绿体基因组序列

Species	accession numbers
Malus florentina	NC_035625
Malus micromalus	NC_036368
Malus prunifolia	NC_031163
Malus trilobata	NC_035671
Malus tschonoskii	NC_035672

第一步：下载序列

下载每个叶绿体基因组的fasta格式；下载作为参考基因组的genbank格式文件。（序列不是很多可以手动下载；如果序列条数非常多自己写个一个简单的python脚本批量下载叶绿体基因组序列）

import os
import sys
import argparse
parser = argparse.ArgumentParser(description='This script was used to download gb or fasta file of cp genome from NCBI nucleotides database')
parser.add_argument('-f','--format',help='Please input file format you want to download',required=True)
parser.add_argument('-a','--accession',help='file name contain accession number of cp genome you want to download',required=True)
args = parser.parse_args()


fr = open(args.accession,'r')
acc = []
for line in fr:
    acc.append(line.strip())
print("The number of sequence will be downloaded is: ",len(acc))

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "" #在这里添加邮箱

fold_name = "downladed_seq_"+args.format
os.mkdir(fold_name)
os.chdir(fold_name)

for line in acc:
    print(line + " will be downloaded!")
    fw = open(line+"."+args.format,'w')
    cp = Entrez.efetch(db='nucleotide',id=[line],rettype=args.format)
    seq = SeqIO.read(cp,args.format)
    SeqIO.write(seq,fw,args.format)
    fw.close()
    print(line + " is downloaded!")

print(len(os.listdir()),"sequence have been downloaded")

使用方式

#下载fasta格式
python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f fasta -a accession_numbers.txt
# 下载genbank格式
python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f gb -a accession_numbers.txt
# 序列号放到文本文件中每行一个

第二步：上传序列至mVISTA

http://genome.lbl.gov/vista/mvista/submit.shtml

image.png
本例中 total number of seuqences is 5

image.png

填写邮箱地址，运行完结果发送至邮箱；上传下载好的fasta格式序列
选择比对程序 论文中通常使用第三个
image.png
上传注释文件
mVISTA要求的注释文件格式为 http://genome.lbl.gov/vista/mvista/instructions.shtml
image.png
可以通过处理genbank格式文件得到

import argparse

from Bio import SeqIO

parser = argparse.ArgumentParser(description = 'This script was used to get mVISTA annotation file used to cp genome comparative analysis')
parser.add_argument('-i','--genbank',help="Please input genBank format file", required = True)
args = parser.parse_args()

fw = open(args.genbank + "_mVISTA_annotation","w")

for rec in SeqIO.parse(args.genbank,"gb"):
    for feature in rec.features:
        if feature.type == "gene":
            for part in feature.location.parts:
                if part.strand == -1:
                    start_location = str(part.start)
                    end_location = str(part.end)
                    gene_name = feature.qualifiers["gene"][0]
                    fw.write("<\t%s\t%s\t%s\n"%(start_location,end_location,gene_name))
                else:
                    start_location = str(part.start)
                    end_location = str(part.end)
                    gene_name = feature.qualifiers["gene"][0]
                    fw.write("<\t%s\t%s\t%s\n"%(start_location,end_location,gene_name))
        elif feature.type == "CDS":
            for part in feature.location.parts:
                if part.strand == -1:
                    start_location = str(part.start)
                    end_location = str(part.end)
                    fw.write("%s\t%s\t%s\n"%(start_location,end_location,"exon"))
                else:
                    start_location = str(part.start)
                    end_location = str(part.end)
                    fw.write("%s\t%s\t%s\n"%(start_location,end_location,"exon"))
        elif feature.type == "tRNA" or feature.type == "rRNA":
            for part in feature.location.parts:
                start_location = str(feature.location.start)
                end_location = str(feature.location.end)
                fw.write("%s\t%s\t%s\n"%(start_location,end_location,"utr"))
        else:
            print("%s %s"%(feature.type,"is not needed"))
            
print("The process is done!")

fw.close()

# 使用方法
python get_mVISTA_annotation_file_from_genbank_1.py -i NC_031163.gb