美文网首页生信相关生物信息Hi-C
HiC-Pro实战 #3D基因组 #表观遗传

HiC-Pro实战 #3D基因组 #表观遗传

作者: 土豆_leah | 来源:发表于2018-07-31 17:49 被阅读366次

    首先在此感谢jimmy非常详尽的教程 HiC数据分析实战之HiC-Pro
    本文为三维基因组学习笔记的第二篇,主要记录HiC-pro的安装遇到的问题及部分实战。


    安装

    1. 首先根据要求说明安装依赖,可由conda安装,注意版本
    • The bowtie2 mapper
    • Python (>2.7, python-3 is not supported) with pysam (>=0.8.3), bx-python(>=0.5.0), numpy(>=1.8.2), and scipy(>=0.15.1)libraries
    • R with the RColorBrewer and ggplot2 (>2.2.1) packages
    • g++ compiler
    • samtools (>1.1)
    • Unix sort (which support -V option) is required ! For Mac OS user, please install the GNU core utilities !
    1. HiC-Pro的安装
      不在conda中的软件代码
    $ pip install https://bitbucket.org/mirnylab/mirnylib/get/tip.tar.gz
    $ pip install https://bitbucket.org/mirnylab/hiclib/get/tip.tar.gz 
    # hicpro的安装
    $ cd ~/biosoft/hicpro
    $ cd ~/biosoft/hicpro
    $ git clone https://github.com/nservant/HiC-Pro.git
    $ cd HiC-Pro
    # 这里要改写配置文件(见下)
    $ cat config-install.txt
    $ mkdir ~/biosoft/hicpro/bin
    $ make configure
    $ make install
    ### 最后安装的时候可能会出现Directory does not exit!,这可能是程序默认在home目录下有bin这个文件夹导致,新建bin文件夹即可。最后 绝对路径/HiC-Pro -h跳出说明即安装成功。
    

    SYSTEM CONFIGURATION

    PREFIX Path to installation folder
    BOWTIE2_PATH Full path the bowtie2 installation directory
    SAMTOOLS_PATH Full path to the samtools installation directory
    R_PATH Full path to the R installation directory
    PYTHON_PATH Full path to the python installation directory (>2.7 - python3 not supported)
    CLUSTER_SYS Scheduler to use for cluster submission. Must be TORQUE, SGE, SLURM or LSF

    运行

    1. 首先需要获得消化片段的BED文件及chromosomes' size表格文件,这里需要限制酶酶切位点及参考基因组信息。根据测试数据来源及digest_genome.py
    $ /PATH/HiC-Pro-master/bin/utils/digest_genome.py -r hindiii -o Refgenome.fasta
    # BED文件格式(-1)
    chr1   0       16007   HIC_chr1_1    0   +
    # chromosomes' size(-1)
    chr1    249250621
    
    HiC-Pro --help
    usage : HiC-Pro -i INPUT -o OUTPUT -c CONFIG [-s ANALYSIS_STEP] [-p] [-h] [-v]
    Use option -h|--help for more information
    
    HiC-Pro 2.10.0
    ---------------
    OPTIONS
    
     -i|--input INPUT : input data folder; Must contains a folder per sample with input files
     -o|--output OUTPUT : output folder
     -c|--conf CONFIG : configuration file for Hi-C processing
     [-p|--parallel] : if specified run HiC-Pro on a cluster
     [-s|--step ANALYSIS_STEP] : run only a subset of the HiC-Pro workflow; if not specified the complete workflow is run
        mapping: perform reads alignment
        proc_hic: perform Hi-C filtering
        quality_checks: run Hi-C quality control plots
        build_contact_maps: build raw inter/intrachromosomal contact maps
        ice_norm: run ICE normalization on contact maps
     [-h|--help]: help
     [-v|--version]: version
    
    1. 根据说明文档,将 configuration file 'config-hicpro.txt' 复制到你的当前目录,并修改;本次测试数据来源于来自于Tung B. K. Le et al. Science 2013 :https://www.ncbi.nlm.nih.gov/sra/?term=srr824846,rawdata文件并不用编排,但是由于程序读写要求,因此需要将数据放入独立的文件夹中。

    Put all input files in a rawdata folder. The input files have to be organized with one folder per sample, with ;

    $ mkdir -p ~/data/project/hic/fq/s1/
    $ cd ~/data/project/hic/fq/s1/
    858M Jul  3 16:21 SRR824846_Q20L10_1.fastq.gz
    857M Jul  3 16:22 SRR824846_Q20L10_2.fastq.gz
    
    # 多个输入文件
    + PATH_TO_MY_DATA
      + sample1
        ++ file1_R1.fastq.gz
        ++ file1_R2.fastq.gz
        ++ ...
      + sample2
        ++ file1_R1.fastq.gz
        ++ file1_R2.fastq.gz
      *...
    
    1. 运行命令如下,jimmy在推文中运用了一系列技巧,可以详细查看学习。
    # 配置文件主要修改内容
    BOWTIE2_IDX_PATH = # bowtie2建立的索引所在的路径,记住绝对路径
    REFERENCE_GENOME = # bowtie2建立的索引
    GENOME_SIZE = # 一个文件记录着参考基因组中每条序列的大小
    GENOME_FRAGMENT = 消化片段的BED文件所在的路径
    LIGATION_SITE = #连接位点
    # 若单个测序数据则
    PAIR1_EXT = SRR824846_Q20L10_1
    PAIR2_EXT = SRR824846_Q20L10_2
    
    $ MY_INSTALL_PATH/bin/HiC-Pro -i FULL_PATH_TO_DATA_FOLDER -o FULL_PATH_TO_OUTPUTS -c MY_LOCAL_CONFIG_FILE
    $ cd out
    $ qsub HiCPro_step1_.sh
    $ qsub HiCPro_step2_.sh
    

    这里记录一个问题,qsub命令报错,目前使用sh命令执行shell脚本。

    $ qsub HiCPro_step1_.sh -p 20 
    
    Unable to initialize environment because of error: cell directory "/opt/gridengine/default" doesn't exist
    Exiting.
    

    待结果出来以后,将进一步学习。

    相关链接地址:
    HiC-Pro_github
    HiC-Pro: An optimized and flexible pipeline for Hi-C processing

    相关文章

      网友评论

        本文标题:HiC-Pro实战 #3D基因组 #表观遗传

        本文链接:https://www.haomeiwen.com/subject/klshvftx.html