VCF文件的变异分布图,除了使用Circos图之外,统计特定材料间的变异分布情况时可能会用到染色体分布图,为了节省时间,实现统计及绘图的自动化,现提供统计数量用的Python脚本和绘图使用的R脚本,仅供参考。如有其它更快捷的方法,或觉得脚本太拙劣,欢迎留言批评交流。
VCF格式。详细格式介绍,请自行搜索了解,这里不赘述了。
PYTHON脚本,使用脚本整理100Kb滑窗中变异位点数量:
# variation_number_in_bin.py
import sys
input_file = sys.argv[1]
step_len = 100000
result = {}
chrom_stats = {
"Chr1":43270923,
"Chr2":35937250,
"Chr3":36413819,
"Chr4":35502694,
"Chr5":29958434,
"Chr6":31248787,
"Chr7":29697621,
"Chr8":28443022,
"Chr9":23012720,
"Chr10":23207287,
"Chr11":29021106,
"Chr12":27531856
}
for chrom in chrom_stats.keys():
step= 0
step_start = 0
result[chrom]={}
result[chrom][step]=0
for line in open(input_file):
if line.startswith(chrom):
pos = line.split()[1]
if step_start < int(pos) < (step_start + step_len):
result[chrom][step] += 1
if int(pos) >= (step_start + step_len):
step += 1
step_start += step_len
result[chrom][step] = 0
result[chrom][step] += 1
for steps,nums in result[chrom].items():
print(chrom, steps, nums, sep="\t")
脚本整理后的文件
下面画图,使用R脚本如下:
library(ggplot2)
library(Cairo)
var_density <- read.table("snp.distribution.stats", sep = "\t", header = T)
names(var_density) <- c("chrom", "region", "variation_count")
var_density$chrom <- factor(var_density$chrom, levels=c(
"Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6",
"Chr7", "Chr8", "Chr9", "Chr10", "Chr11", "Chr12"))
CairoPNG(filename = "variation_distribution.png", width = 12000, height = 7500, res = 600)
distribution <- ggplot(var_density, aes(x=region, y=variation_count))
distribution + geom_line(colour = "darkorange4", size = 0.35) +
guides(fill=FALSE) +
theme(axis.line = element_line(size=0.5),
panel.background = element_rect(fill="white",size=rel(20)),
panel.grid.minor = element_line(colour = "gray"),
axis.text = element_text(family="Times New Roman",size=16,face="bold"),
axis.title = element_text(family="Times New Roman",size=18,face="bold"),
strip.text = element_text(family="Times New Roman",size=12,face="bold")) +
facet_wrap(~chrom, ncol=2)
dev.off()
个人觉得这个图还凑合。
variation_distribution.png
网友评论