Chromap 染色质图谱的快速预处理和比对
-
可用于 bulk ATAC-seq ,ChIP-seq or scATAC-seq. (修剪接头、比对、barcode矫正、去重、Tn5 shift……)
-
haowenz/chromap: Fast alignment and preprocessing of chromatin profiles (github.com)
Chromap is an ultrafast method for aligning and preprocessing high throughput chromatin profiles. Typical use cases include:
(1) trimming sequencing adapters, mapping bulk ATAC-seq or ChIP-seq genomic reads to the human genome and removing duplicates;
(2) trimming sequencing adapters, mapping single cell ATAC-seq genomic reads to the human genome, correcting barcodes, removing duplicates and performing Tn5 shift;
(3) split alignment of Hi-C reads against a reference genome.
In all these three cases, Chromap is 10-20 times faster while being accurate.
Install
conda install -c bioconda -c conda-forge chromap
Usage
chromap -h
Fast alignment and preprocessing of chromatin profiles
Usage:
chromap [OPTION...]
-v, --version Print version
-h, --help Print help
=========================================================
Indexing options:
-i, --build-index Build index
--min-frag-length INT Min fragment length for choosing k and w automatically [30]
-k, --kmer INT Kmer length [17]
-w, --window INT Window size [7]
=========================================================
Mapping options:
--preset STR Preset parameters for mapping reads (always applied before other options) []
atac: mapping ATAC-seq/scATAC-seq reads
chip: mapping ChIP-seq reads
hic: mapping Hi-C reads
--split-alignment Allow split alignments
-e, --error-threshold INT Max # errors allowed to map a read [8]
-s, --min-num-seeds INT Min # seeds to try to map a read [2]
-f, --max-seed-frequencies INT[,INT]
Max seed frequencies for a seed to be selected [500,1000]
-l, --max-insert-size INT Max insert size, only for paired-end read mapping [1000]
-q, --MAPQ-threshold INT Min MAPQ in range [0, 60] for mappings to be output [30]
--min-read-length INT Min read length [30]
--trim-adapters Try to trim adapters on 3
--remove-pcr-duplicates Remove PCR duplicates
--remove-pcr-duplicates-at-bulk-level
Remove PCR duplicates at bulk level for single cell data
--remove-pcr-duplicates-at-cell-level
Remove PCR duplicates at cell level for single cell data
--Tn5-shift Perform Tn5 shift
--low-mem Use low memory mode
--bc-error-threshold INT Max Hamming distance allowed to correct a barcode [1]
--bc-probability-threshold FLT
Min probability to correct a barcode [0.9]
-t, --num-threads INT # threads for mapping [1]
=========================================================
Input options:
-r, --ref FILE Reference file
-x, --index FILE Index file
-1, --read1 FILE Single-end read files or paired-end read files 1
-2, --read2 FILE Paired-end read files 2
-b, --barcode FILE Cell barcode files
--barcode-whitelist FILE Cell barcode whitelist file
--read-format STR Format for read files and barcode files ["r1:0:-1,bc:0:-1" as 10x Genomics single-end
format]
=========================================================
Output options:
-o, --output FILE Output file
--output-mappings-not-in-whitelist
Output mappings with barcode not in the whitelist
--chr-order FILE Custom chromosome order file. If not specified, the order of reference sequences will
be used
--BED Output mappings in BED/BEDPE format
--TagAlign Output mappings in TagAlign/PairedTagAlign format
--SAM Output mappings in SAM format
--pairs Output mappings in pairs format (defined by 4DN for HiC data)
--pairs-natural-chr-order FILE
Custom chromosome order file for pairs flipping. If not specified, the custom
chromosome order will be used
--barcode-translate FILE Convert barcode to the specified sequences during output
--summary FILE Summarize the mapping statistics at bulk or barcode level
- 和其他比对软件一样,先建index
chromap -i -r ref.fa -o index
# ChIP-seq reads
chromap --preset chip -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed
# ATAC-seq reads
chromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed
# scATAC-seq reads
chromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed\
-b barcode.fq.gz --barcode-whitelist whitelist.txt
atac process
- preset 模式到atac,基本的处理过程是trim3'端的接头,比对,细胞水平去重、做ATAC的 peak shift,然后根据提供的barcode 白名单进行barcode矫正。
-
--read-format
指定barcode 在fastq的位置。
网友评论