美文网首页
用bismark比对RRBS数据(一)流程

用bismark比对RRBS数据(一)流程

作者: 蓬举举 | 来源:发表于2020-11-13 13:31 被阅读0次

    准备工作:

    下载bismark及其依赖包perl、bowtie2或者histat2、samtools

    下载参考基因组序列fasta格式

    下载目标比对序列 fasta或者fastq格式

    并且将程序路径写入环境变量(vim .bashrc)

    将程序路径写入环境变量 这样linux在调用perl和bowtie2的时候就会从环境变量设定的路径中去寻找

    (I) Running bismark_genome_preparation

    USAGE:bismark_genome_preparation [options] <path_to_genome_folder>

    A typical genome indexing could look like this:/bismark/bismark_genome_preparation --path_to_aligner /usr/bin/bowtie2/ --verbose /data/genomes/homo_sapiens/GRCh37/

    path_to_aligner 指定用于align的程序的路径

    bismark_genome_preparation --verbose /data/home/lt/Bismark-0.22.3/reference/

    该路径下应该有一个FASTA文件

    如果在环境变量中已经指定过,这里就不用指定。

    Bismark will create two individual folders within this directory, one for a C->T converted genome and the other one for the G->A converted genome. After creating C->T and G->A versions of the genome they will be indexed in parallel using the indexer bowtie2-build (or hisat2-build).

    这一步完成之后会生成两个文件夹。

    (II) Bismark Alignment Step

    在此步骤之前最好对将要比对的数据进行质控。

    USAGE:bismark [options] --genome <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}

    A typical single-end analysis could look like this:

    bismark --genome /data/genomes/homo_sapiens/GRCh38/ sample.fastq.gz

    bismark --genome /data/home/lt/Bismark-0.22.3/reference/ /data/home/lt/fastq/liver/liver_1.fastq

    bismark支持的目标比对文件格式

    sequence format either FastQ or FastA

    single-end or paired-end reads

    input files can be uncompressed orgzip-compressed (ending in.gz)

    variable read length support

    directional or non-directional BS-Seq libraries

    默认文库是方向性的 文件是未压缩的 输出默认是bam格式的文件

    注意后面一定要包含对应的文件名而不是只指定文件夹(这里我错了还不自知嘤嘤嘤)

    This will produce two output files:1.test_dataset_bismark_bt2.bam(contains all alignments plus methylation call strings)2.test_dataset_bismark_SE_report.txt(contains alignment and methylation summary)

    这也会产生两个文件。

    可以加一个 -o 输出目录


    (III) Running deduplicate_bismark

    deduplicate_bismark --bam [options] <filenames>

    This command will deduplicate the Bismark alignment BAM file and remove all reads but one which align to the the very same position and in the same orientation. This step is recommended for whole-genome bisulfite samples, but should not be used for reduced representation libraries such as RRBS, amplicon or target enrichment libraries.

    所以这一步我就跳过啦。

    (IV) Running bismark_methylation_extractor

    USAGE:

    bismark_methylation_extractor [options] <filenames>

    A typical command to extract context-dependent (CpG/CHG/CHH) methylation could look like this:

    bismark_methylation_extractor --gzip --bedGraph test_dataset_bismark_bt2.bam

    This will produce three methytlation output files:

        CpG_context_test_dataset_bismark_bt2.txt.gz

        CHG_context_test_dataset_bismark_bt2.txt.gz

        CHH_context_test_dataset_bismark_bt2.txt.gz

    as well as a bedGraph and a Bismark coverage file.

    bismark_methylation_extractor --gzip --bedGraph liver_1_bismark_bt2.bam 

    可以加一个 -o 输出目录

    The bismark_methylation_extractor comes with a few options, such as ignoring the first number of positions in the methylation call string, e.g. to remove a restriction enzyme site (if RRBS is performed with non-directional BS-Seq libraries it might be required to remove reconstituted MspI sites at the beginning of each read as they will introduce a bias into the first methylation call).

    对于非方向性的BS-seq文库,必须移除内切酶位点的碱基,因为这会引入偏差(bias)。

     bismark_methylation_extractor会自动识别 alignment mode,因此双端和单端的文件不需要特别设置

    (V) Runningbismark2report

    USAGE:bismark2report [options]

    (VI) Runningbismark2summary

    USAGE:bismark2summary [options]

    (VII) Bismark Nucleotide Coverage report (bam2nuc)

    那么这些文件都长什么样呢?如何用到下一步的分析里面?

    且听下回分解。

    相关文章

      网友评论

          本文标题:用bismark比对RRBS数据(一)流程

          本文链接:https://www.haomeiwen.com/subject/nmcvbktx.html