hicpro脚本

作者: 苏牧传媒 | 来源:发表于2019-01-07 22:06 被阅读3次

    脚本位置:/home/shen/biosoft/HiC-Pro_2.11.1/bin/utils

    1.digest_genome.py

    ## Digest the mm9 genome by HindIII

    HICPRO_PATH/bin/utils/digest_genome.py -r A^AGCTT -o mm9_hindiii.bed mm9.fasta

    ## The same ...

    HICPRO_PATH/bin/utils/digest_genome.py -r hindiii -o mm9_hindiii.bed mm9.fasta

    ## Double digestion, HindIII + DpnII

    HICPRO_PATH/bin/utils/digest_genome.py -r hindiii dpnii -o mm9_hindiii_dpnii.bed mm9.fasta

    "mboi": ["^GATC"],

    "dpnii": ["^GATC"],

    "bglii": ["A^GATCT"],

    "hindiii": ["A^AGCTT"]}

    2.hicpro2fithic.py

    python /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2fithic.py -i raw/100000/TU-2_100000.matrix -b raw/100000/TU-2_100000_abs.bed -s iced/100000/TU-2_100000_iced.matrix.biases

    生成三个文件:

    228K    fithic.biases.gz

    144K    fithic.fragmentMappability.gz

    2.7M    fithic.interactionCounts.gz

    接着:

    ref:https://github.com/ay-lab/fithic#-x---chromosome_region

    pip install fithic

    fithic -h 

    报错:“No module named functools_lru_cache” 解决:在普通py环境下:pip install arrow==0.12.0

    运行:

    fithic -f fithic.fragmentMappability.gz -i fithic.interactionCounts.gz -t fithic.biases.gz -o fithic -l TU-2 -v -x intraOnly -r 100000

    -l lable

    -v 可视化

    -x intraOnly interOnly All

    -r, --resolution

    正在运行 运行结果 head两个文件 生成的图片

    他的文章:https://genome.cshlp.org/content/24/6/999

    3.hicpro2higlass.sh

    bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2higlass.sh -h

    chrsize=/media/shen/*/sun/refdata/UCSC_mm10/mm10.chrom.sizes

    bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2higlass.sh -i raw/100000/TU-2_100000.matrix -r 100000 -c $chrsize -n 

    运行中

    生成两个文件:

    接着: 可以用HiGlass浏览器来看。

    ref:http://gehlenborglab.org/research/projects/higlass/

    4.hicpro2juicebox.sh

    bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh -h

    bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh -i TU-2/TU-2.allValidPairs -g /home/shen/biosoft/HiC-Pro_2.11.1/annotation/chrom_mm10.sizes -j /media/shen/*/sun/biosoft/juicer_tools.jar -r /home/shen/biosoft/HiC-Pro_2.11.1/annotation/mm10.mbo1.bed

    bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh \

    -i TU-2/TU-2.allValidPairs \ validpairs

    -g /home/shen/biosoft/HiC-Pro_2.11.1/annotation/chrom_mm10.sizes \ 基因组size信息

    -j /media/shen/*/sun/biosoft/juicer_tools.jar \ juicer包

    -r /home/shen/biosoft/HiC-Pro_2.11.1/annotation/mm10.mbo1.bed # 限制酶切文件

    运行中

    生成:hic文件

    juicerbox:

    java -jar /media/shen/*/sun/biosoft/Juicebox_1.9.8.jar

    5.sparseToDense.py

    python /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/sparseToDense.py -h

    python /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/sparseToDense.py iced/100000/TU-2_100000_iced.matrix -b raw/100000/TU-2_100000_abs.bed -d -o TU-2.matrix

    输出matrix 文件比较大

    接着:

    Create the DI file:

    chrsize=/home/shen/biosoft/HiC-Pro_2.11.1/annotation/chrom_mm10.sizes

    /home/shen/biosoft/domaincall_software/perl_scripts/DI_from_matrix.pl TU-2.matrix 100000 200000 $chrsize > TU-2.matrix.di

    修改DI文件:(mouse)

    cat TU-2.matrix.di | grep -v "M" | sed -e 's/^X/20/g' | sed -e 's/^Y/21/g' > TU-2.matrix.fine.di 

    cp /home/shen/biosoft/domaincall_software/HMM_calls.m ./

    编辑:HMM_calls.m文件:

    Note:

    [1]In line 9 of the script HMM_calls.m,pleasehardcodeyour inputDIfilename.

    [2]In line 77 of the script HMM_calls.m,pleasehardcodeyour outputfilename.

    nice matlab < HMM_calls.m > dumpfile

    6.R包:HiTC

    library("HiTC")

    x=importC("../rawdata_40000_iced.matrix", xgi.bed="../rawdata_40000_ord.bed")

    R=extractRegion(x$chr1chr1, chr="chr1", from=1, to=249250621) #chesize

    di<-directionalityIndex(R)

    ord=read.delim(paste("rawdata_40000_ord.chr1.cut.bed",sep="."), header=F)

    RM=as.matrix(intdata(R))

    RF=cbind(ord[,c(1:3)], RM)

    ord$DI=di

    ord$V1=paste("1")

    ord$DI[is.na(ord$DI)]<-0

    ord$DI[is.nan(ord$DI)]<-0

    FIN.DI=ord[,c(1,2,3,4)]

    => RF is the matrix from HiTC, FIN.DI is the DI file from HiTC

    相关文章

      网友评论

        本文标题:hicpro脚本

        本文链接:https://www.haomeiwen.com/subject/dluprqtx.html