美文网首页生信笔记Single Cell RNA-seq
10x genomics RNAseq数据分析实战

10x genomics RNAseq数据分析实战

作者: 11的雾 | 来源:发表于2018-08-28 10:33 被阅读285次

    10x数据类型:

    10x数据类型

    每个样本测出3个fastq,通过I1,R1,R2来区别,
    下载安装cellranger,
    下载所需要的reference,

    (一)跑cellranger count

    /home/dushiyi/software/biosoftware/cellranger-2.2.0/cellranger count \
    --id=ID24 \
    --fastqs=/cygene/data/20180810_10x/10x/   \
    --sample=WBJPE18018647-1_HMVMKCCXY_L6_WBJPE18018647_20180729_P,WBJPE18018647-2_HMVMKCCXY_L6_WBJPE18018647_20180729_P,WBJPE18018647-3_HMVMKCCXY_L6_WBJPE18018647_20180729_P,WBJPE18018647-4_HMVMKCCXY_L6_WBJPE18018647_20180729_P,WBJPE18018648-1_HMVMKCCXY_L7_WBJPE18018648_20180729_P,WBJPE18018648-2_HMVMKCCXY_L7_WBJPE18018648_20180729_P,WBJPE18018648-3_HMVMKCCXY_L7_WBJPE18018648_20180729_P,WBJPE18018648-4_HMVMKCCXY_L7_WBJPE18018648_20180729_P  \
    --transcriptome=/home/dushiyi/database/refdata-cellranger-GRCh38-1.2.0
    

    8个样本,数据量约132G,耗时38小时,线程20个,内存128G。
    最后得到的结果在outs目录下

    Outputs:
    - Run summary HTML:                      /cygene/data/20180810_10x/work/L006/outs/web_summary.html
    - Run summary CSV:                       /cygene/data/20180810_10x/work/L006/outs/metrics_summary.csv
    - BAM:                                   /cygene/data/20180810_10x/work/L006/outs/possorted_genome_bam.bam
    - BAM index:                             /cygene/data/20180810_10x/work/L006/outs/possorted_genome_bam.bam.bai
    - Filtered gene-barcode matrices MEX:    /cygene/data/20180810_10x/work/L006/outs/filtered_gene_bc_matrices
    - Filtered gene-barcode matrices HDF5:   /cygene/data/20180810_10x/work/L006/outs/filtered_gene_bc_matrices_h5.h5
    - Unfiltered gene-barcode matrices MEX:  /cygene/data/20180810_10x/work/L006/outs/raw_gene_bc_matrices
    - Unfiltered gene-barcode matrices HDF5: /cygene/data/20180810_10x/work/L006/outs/raw_gene_bc_matrices_h5.h5
    - Secondary analysis output CSV:         /cygene/data/20180810_10x/work/L006/outs/analysis
    - Per-molecule read information:         /cygene/data/20180810_10x/work/L006/outs/molecule_info.h5
    - Loupe Cell Browser file:               /cygene/data/20180810_10x/work/L006/outs/cloupe.cloupe
    
    2018-08-29 03:45:03 [perform] Serializing pipestance performance data.
    Waiting 6 seconds for UI to do final refresh.
    Pipestance completed successfully!
    

    (二)用dropEst软件跑10x 数据

    1,创建目录及配置文件

    mkdir -p 01_dropTag 02_alignment 03_dropEst
    sh pipeline.sh \
    /home/dushiyi/software/biosoftware/dropEst/build  \ # dropest软件路径
    /cygene/work/02.dropEst/10x.test.xml \ # 配置文件
    /cygene/work/02.dropEst/star \ # star的索引路径
    /home/dushiyi/database/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf  # gtf文件路径
    

    配置文件xml如下:

    <config>
        <!-- droptag -->
        <TagsSearch>
            <protocol>10x</protocol>
            <BarcodesSearch>
                <barcode1_length>8</barcode1_length>
                <barcode2_length>16</barcode2_length>
                <umi_length>10</umi_length>
                <r1_rc_length>0</r1_rc_length>
            </BarcodesSearch>
    
            <Processing>
                <min_align_length>10</min_align_length>
                <reads_per_out_file>10000000</reads_per_out_file>
                <poly_a_tail>AAAAAAAA</poly_a_tail>
            </Processing>
        </TagsSearch>
    
        <!-- dropest -->
        <Estimation>
            <Merge>
                <barcodes_file>/home/dushiyi/software/biosoftware/dropEst/data/barcodes/10x_aug_2016_split</barcodes_file>
                <barcodes_type>const</barcodes_type>
                <min_merge_fraction>0.2</min_merge_fraction>
                <max_cb_merge_edit_distance>2</max_cb_merge_edit_distance>
                <max_umi_merge_edit_distance>1</max_umi_merge_edit_distance>
                <min_genes_after_merge>100</min_genes_after_merge>
                <min_genes_before_merge>20</min_genes_before_merge>
            </Merge>
    
            <PreciseMerge>
                <max_merge_prob>1e-5</max_merge_prob>
                <max_real_merge_prob>1e-7</max_real_merge_prob>
            </PreciseMerge>
        </Estimation>
    </config>
    

    这里的pipeline.sh如下:

    $ cat pipeline.sh
    if [ "$#" -ne 4 ]; then
        echo "usage: $0 dropest_directory config_file star_index_folder gtf_with_genes"
        echo "example: $0 ~/dropEst/build ~/dropEst/configs/indrop_v3.xml ~/star/mm10/index/ ~/star/mm10/genes.gtf"
        exit 1
    fi
    
    dropest_dir=$1
    config_file=$2
    star_index=$3
    gtf_file=$4
    cd 01_dropTag
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample1 -l sample1 /cygene/work/02.dropEst/data/WBJPE18018647-1_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S1_L006_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-1_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S1_L006_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-1_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S1_L006_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample2 -l sample2 /cygene/work/02.dropEst/data/WBJPE18018647-2_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S2_L006_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-2_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S2_L006_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-2_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S2_L006_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample3 -l sample3 /cygene/work/02.dropEst/data/WBJPE18018647-3_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S3_L006_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-3_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S3_L006_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-3_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S3_L006_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample4 -l sample4 /cygene/work/02.dropEst/data/WBJPE18018647-4_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S4_L006_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-4_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S4_L006_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018647-4_HMVMKCCXY_L6_WBJPE18018647_20180729_P_S4_L006_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample5 -l sample5 /cygene/work/02.dropEst/data/WBJPE18018648-1_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S5_L007_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-1_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S5_L007_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-1_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S5_L007_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample6 -l sample6 /cygene/work/02.dropEst/data/WBJPE18018648-2_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S6_L007_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-2_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S6_L007_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-2_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S6_L007_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample7 -l sample7 /cygene/work/02.dropEst/data/WBJPE18018648-3_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S7_L007_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-3_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S7_L007_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-3_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S7_L007_R2_001.fastq.gz
    $dropest_dir/droptag -c $config_file -r 0 -p 20 -S -s -n sample8 -l sample8 /cygene/work/02.dropEst/data/WBJPE18018648-4_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S8_L007_I1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-4_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S8_L007_R1_001.fastq.gz /cygene/work/02.dropEst/data/WBJPE18018648-4_HMVMKCCXY_L7_WBJPE18018648_20180729_P_S8_L007_R2_001.fastq.gz
    
    cd ../02_alignment
    STAR --runThreadN 20 --genomeDir $star_index --readFilesCommand zcat --outSAMtype BAM Unsorted --readFilesIn /cygene/work/02.dropEst/01_dropTag/sample1.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample2.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample3.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample4.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample5.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample6.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample7.fastq.gz.tagged.fastq.gz,/cygene/work/02.dropEst/01_dropTag/sample8.fastq.gz.tagged.fastq.gz
    
    cd ../03_dropEst
    # $dropest_dir/dropest -w -M -u -G 20 -g $gtf_file -c $config_file ../02_alignment/Aligned.out.bam
    $dropest_dir/dropest -w -m -r "/cygene/work/02.dropEst/01_dropTag/sample8.params.gz /cygene/work/02.dropEst/01_dropTag/sample7.params.gz /cygene/work/02.dropEst/01_dropTag/sample6.params.gz /cygene/work/02.dropEst/01_dropTag/sample5.params.gz /cygene/work/02.dropEst/01_dropTag/sample4.params.gz /cygene/work/02.dropEst/01_dropTag/sample3.params.gz /cygene/work/02.dropEst/01_dropTag/sample2.params.gz /cygene/work/02.dropEst/01_dropTag/sample1.params.gz"  -g $gtf_file -c $config_file ../02_alignment/Aligned.out.bam
    

    分步去跑第一步,droptag,然后将8个样本的结果合并起来,用于第二步的比对,再跑第三步的dropest。
    第三步dropest报错:内存超了128G,因为我服务器的运行内存只有128G,转到天河超算中跑。

    (三)用zUMIs软件跑10x数据

    (四)比较,同一份数据用不同的软件得出的结果比较

    相关文章

      网友评论

        本文标题:10x genomics RNAseq数据分析实战

        本文链接:https://www.haomeiwen.com/subject/ethfwftx.html