美文网首页R科研信息学scRNAseq
用pheatmap画单细胞read 分布的track图

用pheatmap画单细胞read 分布的track图

作者: caokai001 | 来源:发表于2020-06-14 16:54 被阅读0次

    参考:

    生信类捕获信号peak图的美化进阶之路


    单细胞ChIP-seq 和单细胞ATAC-seq 常常出现这个图,展示单细胞信号和bulk ChIP-seq 相似性。之前一直不知道这图如何画,参考上面链接,大佬写的博文才知道了.

    • 选取感兴趣区间,seq 函数创建区间
    • 用bedtools 计算每个细胞里面 每一个bin的count 数目
    • 将不同细胞的count 结果合并一起
    • 按照细胞reads count 进行排序
    • 使用pheatmap 进行画图展示
    image.png

    整理的脚本如下:

    image.png
    • 数据:存放单细胞bed 文件的目录
    • 脚本:数据整理的shell脚本 + 画图的R脚本
    • 运行:两个脚本进行画图
    ## Run1 : 
    bash run_signal_cell_tract.sh /home/kcao/work_dir/Cut_Tag/Fig6/test/sc_H3K27me3_rawdata chr7 132600000 133100000 1500
    
    # 存放单细胞bed 文件的目录: /home/kcao/work_dir/Cut_Tag/Fig6/test/sc_H3K27me3_rawdata
    # 感兴趣的区间染色体编号:chr7
    # 感兴趣的区间start:132600000
    # 感兴趣的区间end :   133100000 
    # 区间的binsize大小:1500
    
    ## Run2 :
    Rscript run_sc_heatmap.R "cell_sort_count_matrix" "cell_sort_list.txt" "heatmap.pdf"
    
    # 统计的cell_bin count matrix  :  "cell_sort_count_matrix" 
    # 每个细胞read 数目,排序后的文件   : "cell_sort_list.txt" 
    # 输出的热图文件名  : "heatmap.pdf"
    

    运行后产生的文件

    image.png

    和IGV bulk ChIP-seq track 拼接 效果

    image.png

    代码如下:

    • run_signal_cell_tract.sh
    #!/usr/bin/env bash
    
    set -ue
    
    if [ $# -lt 5 ]
    then
    echo "Usage : bash run_signal_cell_tract.sh <path_to_signal_cell_bed_directory> <region_chr> <region_start> <region_end> <binsize>
        
        Example: bash run_signal_cell_tract.sh /home/kcao/work_dir/Cut_Tag/Fig6/test/sc_H3K27me3_rawdata chr7 132600000 133100000 1500
        
        Description of input fields:
        Filed1: The directory where the single cell Chip-seq bed file is stored
        Filed2: The interval of interest, the chromosome number
        Filed3: The interval of interest, start
        Filed4:The interval of interest, end
        Filed5: Divide the interval into bin statistics and set specific binsize. 
            Select the corresponding region according to the width of the selected region.
            Setting the appropriate window size is very important for the visual effect of the drawing. 
            For the 200KB area, you can choose 600-800bp window.
        
        
        ref:https://liuyang0681.github.io/2019/04/12/paper0412/
        "
        
    exit 1
    fi
    
    
    sc_beddir=$1
    region_chr=$2
    region_start=$3
    region_end=$4
    binsize=$5
    
    
    echo -e "\t\t" ;date
    echo -e "run:bash run_signal_cell_tract.sh $1 $2 $3 $4 $5"
    echo "----------------------------------------------------"
    
    #######################
    # 1.Generate regions of interest:
    cd `dirname $sc_beddir`
    
    seq -w $region_start $binsize $region_end |awk -v region_chr=$2 '{print region_chr"\t"$1"\t"$1+499"\t""bin_"NR"\t""500"}' > region_bin.bed
    
    # 2. 对细胞进行排序,reads 数目越多,track 越高
    ## 2.1 统计每个细胞reads数目,并排序
    
    if [ ! -d $sc_beddir'_count' ];then
      mkdir $sc_beddir'_count'
      echo "creat "$sc_beddir'_count'
    
    
    cd $sc_beddir
    wc  -l *.bed |sed '$d' | sort -k 1,1n  | sed '1!G;h;$!d'  > ../cell_sort_list.txt
    
    ## 2.2 计算每个cell 的bin区域count数目
    for id in *.bed ;
    do 
    echo "$id" >$sc_beddir'_count'/${id/bed/count.bed};
    bedtools intersect -a ../region_bin.bed -b $id  -c -bed |awk '{print $4"\t"$5"\t"$6}' |cut -f 3  >>$sc_beddir'_count'/${id/bed/count.bed} ;echo "---$id is ok---";
    done
    
    
    else
      echo  $sc_beddir'_count' "已经存在"
    fi
    
    # 3.Merge multiple files
    cd $sc_beddir'_count'
    paste -d "\t" *.bed > ../cell_sort_count_matrix
    
    echo -e "\t\t" ;date
    echo -e "*** bin_cell_matrix finished ***"
    echo "----------------------------------------------------"
    
    
    • run_sc_heatmap.R
    ### Step0:解析参数
    args <- commandArgs(trailingOnly = TRUE)
    print(args)
    
    bin_cell_matrix_file=args[1]
    cell_sort_file=args[2]
    output_file=args[3]
    print(bin_cell_matrix_file)
    
    ### Step1:加载包
    if (!requireNamespace("pheatmap", quietly = TRUE))
      install.packages("pheatmap")
    library(pheatmap)
    
    
    
    ### Syep2 : 读入文件
    bin_cell_matrix <- read.table(bin_cell_matrix_file,header = T)
    rownames(bin_cell_matrix)<- paste0("bin_",1:nrow(bin_cell_matrix) )
    head(bin_cell_matrix[1:5,1:5])
    
    cell_sort <- read.table(cell_sort_file,header = F,col.names = c("count","cell_name"))
    head(cell_sort)
    
    ## Step3.sort matrix by sample reads count
    cell_bin_matrix <- t( bin_cell_matrix[  ,cell_sort[,2]] )
    
    table(rowSums(cell_bin_matrix))
    
    
    ## Step4: pheatmap画热图
    cell_bin_matrix[cell_bin_matrix>2]=2
    sum(cell_bin_matrix[2,])
    
    bk <-seq(0,3,by=1)
    pdf(file=output_file)
    pheatmap(cell_bin_matrix,cluster_cols = FALSE,cluster_rows = FALSE,
             scale = "none",
             show_rownames=F,
             show_colnames = F,
             legend = FALSE,
             color = colorRampPalette(colors = c("white","blue"))(length(bk)),
             legend_breaks=seq(0,3,1),
             breaks=bk)
    dev.off()
    
    

    相关文章

      网友评论

        本文标题:用pheatmap画单细胞read 分布的track图

        本文链接:https://www.haomeiwen.com/subject/cjaxxktx.html