美文网首页R全基因组/外显子组测序分析R语言
VCF变异统计及绘制染色体分布图

VCF变异统计及绘制染色体分布图

作者: 生物数据分析笔记 | 来源:发表于2018-05-04 17:12 被阅读134次

    VCF文件的变异分布图,除了使用Circos图之外,统计特定材料间的变异分布情况时可能会用到染色体分布图,为了节省时间,实现统计及绘图的自动化,现提供统计数量用的Python脚本和绘图使用的R脚本,仅供参考。如有其它更快捷的方法,或觉得脚本太拙劣,欢迎留言批评交流。


    VCF格式。详细格式介绍,请自行搜索了解,这里不赘述了。

    PYTHON脚本,使用脚本整理100Kb滑窗中变异位点数量:

    # variation_number_in_bin.py
    
    import sys
    
    input_file = sys.argv[1]
    step_len = 100000
    result = {}
    
    chrom_stats = {
        "Chr1":43270923,
        "Chr2":35937250,
        "Chr3":36413819,
        "Chr4":35502694,
        "Chr5":29958434,
        "Chr6":31248787,
        "Chr7":29697621,
        "Chr8":28443022,
        "Chr9":23012720,
        "Chr10":23207287,
        "Chr11":29021106,
        "Chr12":27531856
        }
    
    for chrom in chrom_stats.keys():
        step= 0
        step_start = 0
        result[chrom]={}
        result[chrom][step]=0
        for line in open(input_file):
            if line.startswith(chrom):
                pos = line.split()[1]
                if step_start < int(pos) < (step_start + step_len):
                    result[chrom][step] += 1
                if int(pos) >= (step_start + step_len):
                    step += 1
                    step_start += step_len
                    result[chrom][step] = 0
                    result[chrom][step] += 1
    
        for steps,nums in result[chrom].items():
            print(chrom, steps, nums, sep="\t")
    
    脚本整理后的文件

    下面画图,使用R脚本如下:

    library(ggplot2)
    library(Cairo)
    
    var_density <- read.table("snp.distribution.stats", sep = "\t", header = T)
    
    names(var_density) <- c("chrom", "region", "variation_count")
    var_density$chrom <- factor(var_density$chrom, levels=c(
      "Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6",
      "Chr7", "Chr8", "Chr9", "Chr10", "Chr11", "Chr12"))
    
    CairoPNG(filename = "variation_distribution.png", width = 12000, height = 7500, res = 600)
    
    distribution <- ggplot(var_density, aes(x=region, y=variation_count))
    
    distribution + geom_line(colour = "darkorange4", size = 0.35) +
      guides(fill=FALSE) +
      theme(axis.line = element_line(size=0.5),
            panel.background = element_rect(fill="white",size=rel(20)),
            panel.grid.minor = element_line(colour = "gray"),
            axis.text = element_text(family="Times New Roman",size=16,face="bold"),
            axis.title = element_text(family="Times New Roman",size=18,face="bold"),
            strip.text = element_text(family="Times New Roman",size=12,face="bold")) +
      facet_wrap(~chrom, ncol=2)
    
    dev.off()
    
    个人觉得这个图还凑合。 variation_distribution.png

    相关文章

      网友评论

      本文标题:VCF变异统计及绘制染色体分布图

      本文链接:https://www.haomeiwen.com/subject/tmzdrftx.html