Chromap

作者: 重拾生活信心 | 来源:发表于2023-11-22 12:58 被阅读0次

    Chromap 染色质图谱的快速预处理和比对

    Chromap is an ultrafast method for aligning and preprocessing high throughput chromatin profiles. Typical use cases include:
    (1) trimming sequencing adapters, mapping bulk ATAC-seq or ChIP-seq genomic reads to the human genome and removing duplicates;
    (2) trimming sequencing adapters, mapping single cell ATAC-seq genomic reads to the human genome, correcting barcodes, removing duplicates and performing Tn5 shift;
    (3) split alignment of Hi-C reads against a reference genome.
    In all these three cases, Chromap is 10-20 times faster while being accurate.

    Install

    conda install -c bioconda -c conda-forge chromap
    

    Usage

    chromap -h
    Fast alignment and preprocessing of chromatin profiles
    Usage:
      chromap [OPTION...]
    
      -v, --version  Print version
      -h, --help     Print help
    
    
    =========================================================
     Indexing options:
      -i, --build-index          Build index
          --min-frag-length INT  Min fragment length for choosing k and w automatically [30]
      -k, --kmer INT             Kmer length [17]
      -w, --window INT           Window size [7]
    
    
    =========================================================
     Mapping options:
          --preset STR              Preset parameters for mapping reads (always applied before other options) []
                                    atac: mapping ATAC-seq/scATAC-seq reads
                                    chip: mapping ChIP-seq reads
                                    hic: mapping Hi-C reads
          --split-alignment         Allow split alignments
      -e, --error-threshold INT     Max # errors allowed to map a read [8]
      -s, --min-num-seeds INT       Min # seeds to try to map a read [2]
      -f, --max-seed-frequencies INT[,INT]
                                    Max seed frequencies for a seed to be selected [500,1000]
      -l, --max-insert-size INT     Max insert size, only for paired-end read mapping [1000]
      -q, --MAPQ-threshold INT      Min MAPQ in range [0, 60] for mappings to be output [30]
          --min-read-length INT     Min read length [30]
          --trim-adapters           Try to trim adapters on 3
          --remove-pcr-duplicates   Remove PCR duplicates
          --remove-pcr-duplicates-at-bulk-level
                                    Remove PCR duplicates at bulk level for single cell data
          --remove-pcr-duplicates-at-cell-level
                                    Remove PCR duplicates at cell level for single cell data
          --Tn5-shift               Perform Tn5 shift
          --low-mem                 Use low memory mode
          --bc-error-threshold INT  Max Hamming distance allowed to correct a barcode [1]
          --bc-probability-threshold FLT
                                    Min probability to correct a barcode [0.9]
      -t, --num-threads INT         # threads for mapping [1]
    
    
    =========================================================
     Input options:
      -r, --ref FILE                Reference file
      -x, --index FILE              Index file
      -1, --read1 FILE              Single-end read files or paired-end read files 1
      -2, --read2 FILE              Paired-end read files 2
      -b, --barcode FILE            Cell barcode files
          --barcode-whitelist FILE  Cell barcode whitelist file
          --read-format STR         Format for read files and barcode files  ["r1:0:-1,bc:0:-1" as 10x Genomics single-end
                                    format]
    
    
    =========================================================
     Output options:
      -o, --output FILE             Output file
          --output-mappings-not-in-whitelist
                                    Output mappings with barcode not in the whitelist
          --chr-order FILE          Custom chromosome order file. If not specified, the order of reference sequences will
                                    be used
          --BED                     Output mappings in BED/BEDPE format
          --TagAlign                Output mappings in TagAlign/PairedTagAlign format
          --SAM                     Output mappings in SAM format
          --pairs                   Output mappings in pairs format (defined by 4DN for HiC data)
          --pairs-natural-chr-order FILE
                                    Custom chromosome order file for pairs flipping. If not specified, the custom
                                    chromosome order will be used
          --barcode-translate FILE  Convert barcode to the specified sequences during output
          --summary FILE            Summarize the mapping statistics at bulk or barcode level
    
    • 和其他比对软件一样,先建index
    chromap -i -r ref.fa -o index
    
      # ChIP-seq reads
    chromap --preset chip -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed     
      # ATAC-seq reads
    chromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed     
      # scATAC-seq reads
    chromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed\
     -b barcode.fq.gz --barcode-whitelist whitelist.txt                                  
    
    atac process
    • preset 模式到atac,基本的处理过程是trim3'端的接头,比对,细胞水平去重、做ATAC的 peak shift,然后根据提供的barcode 白名单进行barcode矫正。
    image.png
    • --read-format指定barcode 在fastq的位置。

    相关文章

      网友评论

          本文标题:Chromap

          本文链接:https://www.haomeiwen.com/subject/vtaiwdtx.html