一、数据准备:
使用SURVIVOR将得到的vcf文件合并,
SURVIVOR merge filter_vcf_files.txt 1000 0 1 1 0 0 sample.vcf
####最后一个参数是保留的最小长度,这里用到SNP数据,所以用0
将变异个数按照1MB窗口进行滑窗统计变异个数
# variation_number_in_bin.py
import sys
input_file = sys.argv[1]
step_len = 1000000
result = {}
chrom_stats = {
"Chr01":44717177,
"Chr02":37244950,
"Chr03":39727707,
"Chr04":37171045,
"Chr05":31106453,
"Chr06":32139443,
"Chr07":31006940,
"Chr08":30241818,
"Chr09":24981782,
"Chr10":25400072,
"Chr11":33669327,
"Chr12":26652165
}
for chrom in chrom_stats.keys():
step= 0
step_start = 0
result[chrom]={}
result[chrom][step]=0
for line in open(input_file):
if line.startswith(chrom):
pos = line.split()[1]
if step_start < int(pos) < (step_start + step_len):
result[chrom][step] += 1
if int(pos) >= (step_start + step_len):
step += 1
step_start += step_len
result[chrom][step] = 0
result[chrom][step] += 1
for steps,nums in result[chrom].items():
print(chrom, steps, nums, sep="\t")
circos需要准备的数据:
>**基因组信息数据** rice1.txt
#第一列:染色体,表明这是一条染色体
#第二列:占位符,定义所属关系
#第三列:ID,跟身份证号码一样,唯一不能重复的标识(这一列很重要,后面其他作图数据都是要和ID对应)
#第四列: 图中显示的文本
#第五列:起始。以0开始
#第六列:终止。第五列和第六列定义了染色体的实际大小
#第七列:颜色 (这一列我写的是Chr1,Chr2,Chr3... 其实在这里他们代表的是颜色,因为在配置文件colors.ucsc.conf
# 中将不同的色号赋值给了Chr1,Chr2,Chr3... 所以就可用Chr1,Chr2,Chr3... 来代替颜色,当然你可以直接
# 写色号,或是颜色,都可以)
####
chr - hs01 Chr1 0 43270923 Chr1
chr - hs02 Chr2 0 35937250 Chr2
chr - hs03 Chr3 0 36413819 Chr3
chr - hs04 Chr4 0 35502694 Chr4
chr - hs05 Chr5 0 29958434 Chr5
chr - hs06 Chr6 0 31248787 Chr6
chr - hs07 Chr7 0 29697621 Chr7
chr - hs08 Chr8 0 28443022 Chr8
chr - hs09 Chr9 0 23012720 Chr9
chr - hs10 Chr10 0 23207287 Chr10
chr - hs11 Chr11 0 29021106 Chr11
chr - hs12 Chr12 0 27531856 Chr12
###配置文件 circos.conf
karyotype = rice1.txt
chromosomes_units =100000
<plots>
<plot>
type = line
file = num/SNP
color = blue
#color=div-5-spectral
r1 = 0.90r
r0 = 0.81r
#type = line
thickness = 2
max_gap = 1u
</plot>
<plot>
type = heatmap
#type = line
thickness = 2
max_gap = 1u
#color = redv
file = num/DEL
color = reds-9-seq
#color=div-5-spectral
r1 = 0.80r
r0 = 0.71r
</plot>
<plot>
type = heatmap
#type = line
thickness = 2
max_gap = 1u
#color = redv
file = num/INS
color = reds-9-seq
#color=div-5-spectral
r1 = 0.70r
r0 = 0.61r
</plot>
#<plot>
#type = heatmap
#file = num/INS
#color = reds-9-seq
#color=div-5-spectral
#type = line
#thickness = 2
#max_gap = 1u
#color = redv
#r1 = 0.60r
#r0 = 0.51r
#</plot>
#<plot>
#type = heatmap
#file = num/DEL
#color = reds-9-seq
#color=div-5-spectral
#type = line
#thickness = 2
#max_gap = 1u
#color = redv
#r1 = 0.50r
#r0 = 0.41r
#</plot>
#<plot>
#type = scatter
#fill_color = black # 填充色
#stroke_color = black
#glyph = circle
#glyph_size = 5 # 元素大小
#file = INV.txt
#r1 = 0.80r
#r0 = 0.71r
#</plot>
#<plot>
#type = histogram
#file = INV.txt
#fill_color = blue # 填充色
#r1 = 0.89r
#r0 = 0.81r
#</plot>
</plots>
#<links>
#<link>
#file = links.txt
#radius = 0.61r
#color = blue_a4
#ribbon = yes
#</link>
#</links>
# Includes content from ideogram.conf (included file path is relative
# to the file that included it). Conventionally, I separate the
# contents of the <ideogram> block from circos.conf and include
# it via ideogram.conf.
<<include ideogram.conf>>
# Similarly, I put the <ticks> block in ticks.conf
#<<include ticks.conf>>
<image>
<<include etc/image.conf>>
</image>
<<include etc/colors_fonts_patterns.conf>>
<<include etc/housekeeping.conf>>
circos需要的文件格式
####ideogram.conf
<ideogram>
<spacing>
default = 0.001r
<pairwise hs01;hs12>
# 以下设置两条染色体之间的间隙为圆的20度角
spacing = 40r
#radius=0.1r
</pairwise>
</spacing>
#<spacing>
# 也可设置指定的两条染色体之间的间隙
#</spacing>
# Ideogram position, fill and outline
radius = 0.90r
thickness = 20p
fill = yes
stroke_color = dgrey
stroke_thickness = 2p
# Minimum definition for ideogram labels.
show_label = yes
# see etc/fonts.conf for list of font names
label_font = default
label_radius = dims(image,radius)-60p
label_size = 30
label_parallel = yes
</ideogram>
最后在终端直接运行circos,
可以得到一张png和一张svg格式的图片,最后还可以用PS或者AI进行适当修改,
不说了,开组会了
网友评论