美文网首页BS-seq生信分析工具包生信
nf-core 一个把各个组学分析流程化发了 NBT 的牛逼东西

nf-core 一个把各个组学分析流程化发了 NBT 的牛逼东西

作者: 热衷组培的二货潜 | 来源:发表于2020-03-04 14:55 被阅读0次

    链接:
    The nf-core framework for community-curated bioinformatics pipelines

    包括哪些流程?

    链接:https://nf-co.re/pipelines

    • nascent :Nascent Transcription Processing Pipeline
    • proteomicslfq:UNDER CONSTRUCTION - Proteomics label-free quantification (LFQ) analysis pipeline
    • RNA-seq:RNA sequencing analysis pipeline using STAR, HISAT2 and Salmon with gene counts and quality control
    • ampliseq: 16S rRNA amplicon sequencing analysis workflow using QIIME2
    • CAGE-seq: CAGE-sequencing analysis pipeline with trimming, alignment and counting of CAGE tags.
    • nanoseq: UNDER CONSTRUCTION: Nanopore demultiplexing, QC and alignment pipeline
    • sRNA-seq: A small-RNA sequencing analysis pipeline
    • sarek: Analysis pipeline to detect germline or somatic variants from WGS / targeted sequencing
    • bacass: Simple bacterial assembly and annotation pipeline
    • ChIP-seq: ChIP-seq peak-calling, QC and differential analysis pipeline.
    • ATAC-seq: ATAC-seq peak-calling, QC and differential analysis pipeline
    • MNase-seq: UNDER CONSTRUCTION: MNase-seq analysis pipeline using BWA and DANPOS2.
    • SmartSeq2: A pipeline for processing single cell RNA-seq data generated with the SmartSeq2 protocol.
    • scrnaseq: A single-cell RNAseq pipeline for 10X genomics data
    • RNAfusion: RNA-seq analysis pipeline for detection gene-fusions
    • BS-seq: Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
    • lncpipe: UNDER DEVELOPMENT--- Analysis of long non-coding RNAs from RNA-seq datasets
    • Hi-C: Analysis of Chromosome Conformation Capture data (Hi-C)
    • GUIDE-Seq: UNDER CONSTRUCTION - CRISPR-Cas off-target identification using GUIDE-Seq
    • slamseq: UNDER CONSTRUCTION: Slam-seq processing and analysis pipeline
    • hlatyping: Precision HLA typing from next-generation sequencing data
    • exoseq : Please consider using/contributing to https://github.com/nf-core/sarek
      .....
      以上仅列出我想列出的流程,还有很多。下面我拆分一个流程(BS-seq),看看主要步骤:

      从上图可以看到主要分为两种方法。一种从 Bismark,一种用 bwa-meth

    分析参考手册:

    https://github.com/nf-core/methylseq/blob/master/docs/usage.md

    # 输入文件
    --reads 'path/to/data/sample_*_{1,2}.fastq' # 双端
    --single_end --reads '*.fastq' #单端
    
    
    
    # 参考基因组
    ## 内置的
    人类:--genome GRCh37 和 --genome GRCh38
    小鼠:--genome GRCm38
    
    ## 自己提供参考基因组序列
    ### Single multifasta for genome
    --fasta /path/to/genome.fa
    
    ### Bismark index directory
    --bismark_index /path/to/ref/BismarkIndex/
    
    ### bwa-meth index filename base
    ### where for example the index files are called:
    ### /path/to/ref/genome.fa.bwameth.c2t.bwt
    --bwa_meth_index /path/to/ref/genome.fa
    
    ### Genome Fasta index file
    --fasta_index /path/to/genome.fa.fai 
    
    ## 保存建立的索引信息
    --save_reference
    
    
    
    # 接头过滤设置
    Parameter   5' R1 Trim  5' R2 Trim  3' R1 Trim  3' R2 Trim
    --pbat                 6        9       6       9
    --single_cell         6         6   6   6
    --epignome        8             8   8   8
    --accel           10    15  10  10
    --zymo          10  15  10  10
    --cegx          6   6   2   2
    
    --rrbs # 征对于 RRBS 建库
    --pbat # PBAT 建立,互补链,与参数 --directional (bismark 默认的方式)相反
    --non_directional # 输出四种可能的链,注意 --single_cell 和 --zymo 模式自动设置了此参数
    --skip_trimming # 如果输入文件为 clean data 可以此参数跳过这一步
    
    --comprehensive # 输出 CHG/CHH 信息
    
    --relax_mismatches and --num_mismatches # Bismark 默认 --num_mismatches 为 0.2,--relax_mismatches 则为 0.6;
    # 0.6 will allow a penalty of bp * -0.6 - for 100bp reads, this is -60. Mismatches cost -6, gap opening -5 and gap extension -2. So, -60 would allow 10 mismatches or ~ 8 x 1-2bp indels.
    
    --unmapped # 保存为比对上的 reads
    --save_trimmed # 保存过滤掉的 reads
    
    --min_depth # MethylDackel 输出甲基化信息掉的最小覆盖度
    --meth_cutoff # Bimark 中 bismark_methylation_extractor 提取甲基化信息的最小覆盖度
    
    --ignore_flags:MethylDackel 处理 BAM 文件时候忽略 SAM 文件 flag 值
    --methyl_kit:MethylDackel 输出结果可直接输入 R 包 methylKit 进行后续分析
    
    --known_splices:Bismark 运行 only works with `--aligner bismark_hisat`
    
    --slamseq:Specify to run Bismark with the `--slam` flag to run bismark in SLAM-seq mode (only works with `--aligner bismark_hisat`)
    
    --local_alignment:Specify to run Bismark with the --local flag to allow soft-clipping of reads. This should only be used with care in certain single-cell applications or PBAT libraries, which may produce chimeric read pairs
    
    --max_memory:最大内存,例如 --max_memory '8.GB'
    --max_cpus:--max_cpus 1
    
    
    

    主要分析步骤提取

    参数设置地方:

    parameters.settings.json

    00、基因组建立 index

    # 第一种:Bismark
    ## $aligner 有两种:'--hisat2' : '--bowtie2',注意这两的 index 不能互用
    aligner = params.aligner == 'bismark_hisat' ? '--hisat2' : '--bowtie2' # 比对软件
    slam = params.slamseq ? '--slam' : '' # 测序类型
    
    mkdir BismarkIndex
    cp $fasta BismarkIndex/
    bismark_genome_preparation $aligner $slam BismarkIndex
    
    # 第二种:bwameth
    bwameth.py index $fasta
    

    插播:Bismark 比对参数
    https://github.com/FelixKrueger/Bismark
    Bismark alignment modes

    STEP 1 - FastQC

    fastqc --quiet --threads $task.cpus $reads
    

    STEP 2 - Trim Galore!

    说明书:https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md

    5' Trimming can be accomplished with Trim Galore using:
    
    --clip_r1 <NUMBER> (Read 1) or
    
    --clip_r2 <NUMBER> (Read 2)
    
    3' Trimming can be accomplished with Trim Galore using:
    
    --three_prime_clip_r1 <NUMBER> (Read 1) or
    
    --three_prime_clip_r2 <NUMBER> (Read 2).
    
    c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : ''
    c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : ''
    tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : ''
    tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : ''
    rrbs = params.rrbs ? "--rrbs" : '' # 有几种模式:`--rrbs`/`--non_directional`
    # $reads 输入的 fq 文件
    # 单端
    trim_galore --fastqc --gzip $rrbs $c_r1 $tpc_r1 $reads
    
    # 双端
    trim_galore --paired --fastqc --gzip $rrbs $c_r1 $c_r2 $tpc_r1 $tpc_r2 $reads
    

    使用 Bismark 流程

    STEP 3.1 - align with Bismark

    aligner = params.aligner == "bismark_hisat" ? "--hisat2" : "--bowtie2"
    pbat = params.pbat ? "--pbat" : ''
    non_directional = params.single_cell || params.zymo || params.non_directional ? "--non_directional" : ''
    unmapped = params.unmapped ? "--unmapped" : ''
    mismatches = params.relax_mismatches ? "--score_min L,0,-${params.num_mismatches}" : ''
    soft_clipping = params.local_alignment ? "--local" : ''
    splicesites = params.aligner == "bismark_hisat" && knownsplices.name != 'null' ? "--known-splicesite-infile <(hisat2_extract_splice_sites.py ${knownsplices})" : ''
    multicore = "--multicore $ccore" # 这一步有根据的你的服务器计算能力计算,详情看原文, $ccore 表示线程数
    
    bismark $input \\
                $aligner \\
                --bam $pbat $non_directional $unmapped $mismatches $multicore \\
                --genome $index \\
                $reads \\
                $soft_clipping \\
                $splicesites
    

    STEP 4 - Bismark deduplicate

    fq_type = params.single_end ? '-s' : '-p' # 单端 `-s`; 双端 `-p`
    
    deduplicate_bismark $fq_type --bam $bam
    

    STEP 5 - Bismark methylation extraction

    插播:https://github.com/FelixKrueger/Bismark/tree/master/Docs

    comprehensive = params.comprehensive ? '--comprehensive --merge_non_CpG' : ''
    meth_cutoff = params.meth_cutoff ? "--cutoff ${params.meth_cutoff}" : ''
    # 测试 CPU
    ccore = ((task.cpus as int) / 10) as int
    multicore = "--multicore $ccore"
    # 测试内存
    mbuffer = (task.memory as nextflow.util.MemoryUnit) - 2.GB  
    buffer = "--buffer_size ${mbuffer.toGiga()}G"   # if( mbuffer.compareTo(4.GB) == 1 )
    
    ## 如果是植物请加上 --CX 不然不会输出 CHG\CHH 甲基化信息
    # 单端
    bismark_methylation_extractor $comprehensive $meth_cutoff \\
                    $multicore $buffer \\
                    --bedGraph \\
                    --counts \\
                    --gzip \\
                    -s \\
                    --report \\
                    $bam
    
    
    # 双端
    bismark_methylation_extractor $comprehensive $meth_cutoff \\
                    $multicore $buffer \\
                    --ignore_r2 2 \\
                    --ignore_3prime_r2 2 \\
                    --bedGraph \\
                    --counts \\
                    --gzip \\
                    -p \\
                    --no_overlap \\
                    --report \\
                    $bam
    

    STEP 6 - Bismark Sample Report

    # set val(name), file(align_log), file(dedup_log), file(splitting_report), file(mbias) from ch_bismark_logs_for_bismark_report
    
    bismark2report \\
                --alignment_report $align_log \\
                --dedup_report $dedup_log \\
                --splitting_report $splitting_report \\
                --mbias_report $mbias
    

    STEP 7 - Bismark Summary Report

    bismark2summary # 直接运行此就行
    

    使用 bwa-meth 流程

    STEP 3.1 - align with bwa-meth

    fasta = bwa_meth_indices[0].toString() - '.bwameth' - '.c2t' - '.amb' - '.ann' - '.bwt' - '.pac' - '.sa'
    prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/
    
    bwameth.py \\
                --threads ${task.cpus} \\
                --reference $fasta \\
                $reads | samtools view -bS - > ${prefix}.bam
    

    STEP 4.- samtools flagstat on samples

    samtools sort $bam \\
                -@ ${task.cpus} $sort_mem \\
                -o ${bam.baseName}.sorted.bam
    
    samtools index ${bam.baseName}.sorted.bam
    samtools flagstat ${bam.baseName}.sorted.bam > ${bam.baseName}_flagstat_report.txt
    samtools stats ${bam.baseName}.sorted.bam > ${bam.baseName}_stats_report.txt
    

    STEP 5 - Mark duplicates with picard

    picard -Xmx${avail_mem}g MarkDuplicates \\
                    INPUT=$bam \\
                    OUTPUT=${bam.baseName}.markDups.bam \\
                    METRICS_FILE=${bam.baseName}.markDups_metrics.txt \\
                    REMOVE_DUPLICATES=false \\
                    ASSUME_SORTED=true \\
                    PROGRAM_RECORD_ID='null' \\
                    VALIDATION_STRINGENCY=LENIENT
    
    samtools index ${bam.baseName}.markDups.bam
    

    STEP 6 - extract methylation with MethylDackel

    all_contexts = params.comprehensive ? '--CHG --CHH' : ''
    min_depth = params.min_depth > 0 ? "--minDepth ${params.min_depth}" : ''
    ignore_flags = params.ignore_flags ? "--ignoreFlags" : ''
    methyl_kit = params.methyl_kit ? "--methylKit" : ''
    
    MethylDackel extract $all_contexts $ignore_flags $methyl_kit $min_depth $fasta $bam
    MethylDackel mbias $all_contexts $ignore_flags $fasta $bam ${bam.baseName} --txt > ${bam.baseName}_methyldackel.txt
    

    STEP 8 - Qualimap

    samtools sort $bam \\
            -@ ${task.cpus} $sort_mem \\
            -o ${bam.baseName}.sorted.bam
    
    qualimap bamqc $gcref \\
            -bam ${bam.baseName}.sorted.bam \\
            -outdir ${bam.baseName}_qualimap \\
            --collect-overlap-pairs \\
            --java-mem-size=${task.memory.toGiga()}G \\
            -nt ${task.cpus}
    

    STEP 9 - preseq

    samtools sort $bam \\
            -@ ${task.cpus} $sort_mem \\
            -o ${bam.baseName}.sorted.bam
    
    preseq lc_extrap -v -B ${bam.baseName}.sorted.bam -o ${bam.baseName}.ccurve.txt
    

    STEP 10 - MultiQC

    rtitle = custom_runName ? "--title \"$custom_runName\"" : ''
    rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : ''
    
    multiqc -f $rtitle $rfilename --config $multiqc_config . \\
            -m custom_content -m picard -m qualimap -m bismark -m samtools -m preseq -m cutadapt -m fastqc
    

    STEP 11 - Output Description HTML

    相关文章

      网友评论

        本文标题:nf-core 一个把各个组学分析流程化发了 NBT 的牛逼东西

        本文链接:https://www.haomeiwen.com/subject/oigmlhtx.html