- 查看帮助文档
bmtagger.sh -h
usage: bmtagger [-hV] [-q 0|1] [-C config] -1 input.fa [-2 matepairs.fa] -b genome.wbm -d genome-seqdb -x srindex [-o blacklist] [-T tmpdir] [-X]
usage: bmtagger [-hV] [-q 0|1] [-C config] -1 input.fa [-2 matepairs.fa] --ref=reference [-o blacklist] [-T tmpdir] [-X]
usage: bmtagger [-hV] [-q 0|1] [-C config] -A accession [--ref=reference] [-b genome.wbm] [-d genome-seqdb] [-x srindex] [-T tmpdir]
use --ref=name to point to .wbm, seqdb and srprism index if they have the same path and basename
use --extract or -X to generate fasta or fastq files which will NOT contain tagged sequences (-o required)
use --debug to leave temporary data on exit
use --old-srprism to use options for older version of srprism (interferes with config file)
Using following programs:
/software/miniconda2/envs/bin/bmfilter
/software/miniconda2/envs/bin/srprism
/software/miniconda2/envs/bin/blastn
/software/miniconda2/envs/bin/extract_fullseq
原始数据去宿主
- 去除人类宿主序列
ref='/database/ref/human/human'
bmtagger.sh -q 1 -1 2_cleandata/${sample}_clean_R1.fq \
-2 2_cleandata/${sample}_clean_R2.fq \
-b "$ref".bitmask -d $ref -x "$ref" \
-o 2_cleandata/${sample} \
-X
最会生成{sample}_1.fastq
和 {sample}_2.fastq
的结果文件
- 这里很容易出现内存溢出,为了避免这个问题,可以考虑将reads 分段成N份(比如N=10)分别进行比对,最后再将N份比对结果合并即可。
基于质控后的clean data,继续进行去宿主
- 数据被分为10份(
0001-0010.{sample}_clean_R[12].fq
)
ref='/database/ref/human/human'
ls *.${sample}_clean_R1.fq | cut -d"/" -f2 | cut -d"." -f1 | \
xargs -P5 -i sh -c "~/R3.6/bin/bmtagger.sh -q 1\
-1 2_cleandata/{}.${sample}_clean_R1.fq \
-2 2_cleandata/{}.${sample}_clean_R2.fq \
-b "$ref".bitmask -d $ref -x "$ref" \
-o 2_cleandata/{}.${sample} \
-X"
会生成0001-0010.{sample}_[12].fastq
- 合并10份数据为一份,用于后续分析
cat [0-9][0-9][0-9][0-9].${sample}_[12].fastq | \
pigz -p 12 -f -c \
> ${sample}_clean_R[12].fq.gz
网友评论