美文网首页走进转录组转录组学习已应用
转录组分析实战附录:Trinity 拼接结果质量控制

转录组分析实战附录:Trinity 拼接结果质量控制

作者: Yeyuntian | 来源:发表于2019-02-07 09:18 被阅读171次

    在第二节中,我们采用了Trinity工具做了转录组数据的拼接,我一共是6个样本6个G的数据量,在我那个设置下跑了接近30多个小时就完成了拼接工作。

    那么今天的工作就是通过RSeQC这个软件对拼接结果进行一个质量控制与可视化

    这个软件主要是针对于一些临床RNAseq的数据以及有参考基因组的数据,但是对没有参考基因组的RNAseq数据就很多Tool没有办法使用。

    首先,通过bowtie2对得到的Trinity拼接好的fasta格式进行构建Index

    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ll | sort -nk 7total 86667684
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 left.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 right.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:08 both.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:19 .jellyfish_count.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:24 .jellyfish_dump.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:26 .jellyfish_histo.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm_renamed.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 inchworm.K25.L25.DS.fa.finished
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 partitioned_reads.files.list.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 recursive_trinity.cmds.ok
    -rw-rw-r-- 1 yeyt yeyt           9 Sep 14 01:08 both.fa.read_count
    -rw-rw-r-- 1 yeyt yeyt          10 Sep 14 01:59 inchworm.kmer_count
    -rw-rw-r-- 1 yeyt yeyt        2757 Sep 14 10:31 pipeliner.3855.cmds
    -rw-rw-r-- 1 yeyt yeyt       22843 Sep 14 01:26 jellyfish.kmers.fa.histo
    -rw-rw-r-- 1 yeyt yeyt    13366802 Sep 14 10:39 partitioned_reads.files.list
    -rw-rw-r-- 1 yeyt yeyt    47753878 Sep 14 10:39 recursive_trinity.cmds
    -rw-rw-r-- 1 yeyt yeyt   493022300 Sep 14 03:27 inchworm.K25.L25.DS.fa
    -rw-rw-r-- 1 yeyt yeyt 11602365066 Sep 14 01:08 both.fa
    -rw-rw-r-- 1 yeyt yeyt 26501596675 Sep 14 01:24 jellyfish.kmers.fa
    -rw-rw-r-- 1 yeyt yeyt 49568938526 Sep 14 06:38 scaffolding_entries.sam
    drwxrwxr-x 2 yeyt yeyt        4096 Sep 14 10:33 chrysalis/
    drwxrwxr-x 3 yeyt yeyt        4096 Sep 14 01:03 insilico_read_normalization/
    drwxrwxr-x 4 yeyt yeyt        4096 Sep 14 10:39 read_partitions/
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 15 20:51 align_stats.txt
    -rw-rw-r-- 1 yeyt yeyt          62 Sep 15 20:51 bowtie2.bam
    -rw-rw-r-- 1 yeyt yeyt         651 Sep 15 07:03 Trinity.timing
    -rw-rw-r-- 1 yeyt yeyt    10213332 Sep 15 07:03 Trinity.fasta.gene_trans_map
    -rw-rw-r-- 1 yeyt yeyt    47753878 Sep 15 07:03 recursive_trinity.cmds.completed
    -rw-rw-r-- 1 yeyt yeyt   244740565 Sep 15 07:03 Trinity.fasta
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2-build Trinity.fasta Trinity.fasta
    Settings:
      Output files: "Trinity.fasta.*.bt2"
      Line rate: 6 (line is 64 bytes)
      Lines per side: 1 (side is 64 bytes)
      Offset rate: 4 (one in 16)
      FTable chars: 10
      Strings: unpacked
      Max bucket size: default
      Max bucket size, sqrt multiplier: default
      Max bucket size, len divisor: 4
      Difference-cover sample period: 1024
      Endianness: little
      Actual local endianness: little
      Sanity checking: disabled
      Assertions: disabled
      Random seed: 0
      Sizeofs: void*:8, int:4, long:8, size_t:8
    Input files DNA, FASTA:
      Trinity.fasta
    Building a SMALL index
    Reading reference sizes
      Time reading reference sizes: 00:00:03
    Calculating joined length
    Writing header
    Reserving space for joined string
    Joining reference sequences
    ...
    Exiting Ebwt::buildToDisk()
    Returning from initFromVector
    Wrote 103828770 bytes to primary EBWT file: Trinity.fasta.rev.1.bt2
    Wrote 55572488 bytes to secondary EBWT file: Trinity.fasta.rev.2.bt2
    Re-opening _in1 and _in2 as input streams
    Returning from Ebwt constructor
    Headers:
        len: 222289920
        bwtLen: 222289921
        sz: 55572480
        bwtSz: 55572481
        lineRate: 6
        offRate: 4
        offMask: 0xfffffff0
        ftabChars: 10
        eftabLen: 20
        eftabSz: 80
        ftabLen: 1048577
        ftabSz: 4194308
        offsLen: 13893121
        offsSz: 55572484
        lineSz: 64
        sideSz: 64
        sideBwtSz: 48
        sideBwtLen: 192
        numSides: 1157761
        numLines: 1157761
        ebwtTotLen: 74096704
        ebwtTotSz: 74096704
        color: 0
        reverse: 1
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ll | sort -nk 7
    total 86822504
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 left.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 right.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:08 both.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:19 .jellyfish_count.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:24 .jellyfish_dump.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:26 .jellyfish_histo.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm_renamed.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 inchworm.K25.L25.DS.fa.finished
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 partitioned_reads.files.list.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 recursive_trinity.cmds.ok
    -rw-rw-r-- 1 yeyt yeyt           9 Sep 14 01:08 both.fa.read_count
    -rw-rw-r-- 1 yeyt yeyt          10 Sep 14 01:59 inchworm.kmer_count
    -rw-rw-r-- 1 yeyt yeyt        2757 Sep 14 10:31 pipeliner.3855.cmds
    -rw-rw-r-- 1 yeyt yeyt       22843 Sep 14 01:26 jellyfish.kmers.fa.histo
    -rw-rw-r-- 1 yeyt yeyt    13366802 Sep 14 10:39 partitioned_reads.files.list
    -rw-rw-r-- 1 yeyt yeyt    47753878 Sep 14 10:39 recursive_trinity.cmds
    -rw-rw-r-- 1 yeyt yeyt   493022300 Sep 14 03:27 inchworm.K25.L25.DS.fa
    -rw-rw-r-- 1 yeyt yeyt 11602365066 Sep 14 01:08 both.fa
    -rw-rw-r-- 1 yeyt yeyt 26501596675 Sep 14 01:24 jellyfish.kmers.fa
    -rw-rw-r-- 1 yeyt yeyt 49568938526 Sep 14 06:38 scaffolding_entries.sam
    drwxrwxr-x 2 yeyt yeyt        4096 Sep 14 10:33 chrysalis/
    drwxrwxr-x 3 yeyt yeyt        4096 Sep 14 01:03 insilico_read_normalization/
    drwxrwxr-x 4 yeyt yeyt        4096 Sep 14 10:39 read_partitions/
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 15 20:51 align_stats.txt
    -rw-rw-r-- 1 yeyt yeyt          62 Sep 15 20:51 bowtie2.bam
    -rw-rw-r-- 1 yeyt yeyt         651 Sep 15 07:03 Trinity.timing
    -rw-rw-r-- 1 yeyt yeyt    10213332 Sep 15 07:03 Trinity.fasta.gene_trans_map
    -rw-rw-r-- 1 yeyt yeyt    47753878 Sep 15 07:03 recursive_trinity.cmds.completed
    -rw-rw-r-- 1 yeyt yeyt   244740565 Sep 15 07:03 Trinity.fasta
    drwxrwxr-x 3 yeyt yeyt        4096 Sep 15 18:42 ../
    drwxrwxr-x 5 yeyt yeyt        4096 Sep 15 20:51 ./
    -rw-rw-r-- 1 yeyt yeyt     1984490 Sep 23 13:50 Trinity.fasta.3.bt2
    -rw-rw-r-- 1 yeyt yeyt    55572480 Sep 23 13:50 Trinity.fasta.4.bt2
    -rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:03 Trinity.fasta.2.bt2
    -rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:16 Trinity.fasta.rev.2.bt2
    -rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:03 Trinity.fasta.1.bt2
    -rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:16 Trinity.fasta.rev.1.bt2
    
    在最后生成的6个以bt2结尾的则是Index文件
    接下来进行Bowtie2回贴并生成sam文件
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_2.P.fq.gz -S B251.sam
    
    #最后生成的以下文件log:
    #回贴B251的双端测序结果
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B251_2.P.fq.gz -S B251.sam
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = (unset),
            LC_PAPER = "zh_CN.UTF-8",
            LC_ADDRESS = "zh_CN.UTF-8",
            LC_MONETARY = "zh_CN.UTF-8",
            LC_NUMERIC = "zh_CN.UTF-8",
            LC_TELEPHONE = "zh_CN.UTF-8",
            LC_IDENTIFICATION = "zh_CN.UTF-8",
            LC_MEASUREMENT = "zh_CN.UTF-8",
            LC_TIME = "zh_CN.UTF-8",
            LC_NAME = "zh_CN.UTF-8",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    28213701 reads; of these:
      28213701 (100.00%) were paired; of these:
        3865337 (13.70%) aligned concordantly 0 times
        2140365 (7.59%) aligned concordantly exactly 1 time
        22207999 (78.71%) aligned concordantly >1 times
        ----
        3865337 pairs aligned concordantly 0 times; of these:
          134400 (3.48%) aligned discordantly 1 time
        ----
        3730937 pairs aligned 0 times concordantly or discordantly; of these:
          7461874 mates make up the pairs; of these:
            2553395 (34.22%) aligned 0 times
            273693 (3.67%) aligned exactly 1 time
            4634786 (62.11%) aligned >1 times
    95.47% overall alignment rate
    #回贴B252的双端测序结果
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B252_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/B252_2.P.fq.gz -S B252.sam
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = (unset),
            LC_PAPER = "zh_CN.UTF-8",
            LC_ADDRESS = "zh_CN.UTF-8",
            LC_MONETARY = "zh_CN.UTF-8",
            LC_NUMERIC = "zh_CN.UTF-8",
            LC_TELEPHONE = "zh_CN.UTF-8",
            LC_IDENTIFICATION = "zh_CN.UTF-8",
            LC_MEASUREMENT = "zh_CN.UTF-8",
            LC_TIME = "zh_CN.UTF-8",
            LC_NAME = "zh_CN.UTF-8",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    24423445 reads; of these:
      24423445 (100.00%) were paired; of these:
        2755943 (11.28%) aligned concordantly 0 times
        2003579 (8.20%) aligned concordantly exactly 1 time
        19663923 (80.51%) aligned concordantly >1 times
        ----
        2755943 pairs aligned concordantly 0 times; of these:
          82738 (3.00%) aligned discordantly 1 time
        ----
        2673205 pairs aligned 0 times concordantly or discordantly; of these:
          5346410 mates make up the pairs; of these:
            1943923 (36.36%) aligned 0 times
            258490 (4.83%) aligned exactly 1 time
            3143997 (58.81%) aligned >1 times
    96.02% overall alignment rate
    #回贴R251的双端测序结果
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R251_2.P.fq.gz -S R251sam
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = (unset),
            LC_PAPER = "zh_CN.UTF-8",
            LC_ADDRESS = "zh_CN.UTF-8",
            LC_MONETARY = "zh_CN.UTF-8",
            LC_NUMERIC = "zh_CN.UTF-8",
            LC_TELEPHONE = "zh_CN.UTF-8",
            LC_IDENTIFICATION = "zh_CN.UTF-8",
            LC_MEASUREMENT = "zh_CN.UTF-8",
            LC_TIME = "zh_CN.UTF-8",
            LC_NAME = "zh_CN.UTF-8",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    24498964 reads; of these:
      24498964 (100.00%) were paired; of these:
        2605874 (10.64%) aligned concordantly 0 times
        2058157 (8.40%) aligned concordantly exactly 1 time
        19834933 (80.96%) aligned concordantly >1 times
        ----
        2605874 pairs aligned concordantly 0 times; of these:
          68645 (2.63%) aligned discordantly 1 time
        ----
        2537229 pairs aligned 0 times concordantly or discordantly; of these:
          5074458 mates make up the pairs; of these:
            1920173 (37.84%) aligned 0 times
            259673 (5.12%) aligned exactly 1 time
            2894612 (57.04%) aligned >1 times
    96.08% overall alignment rate
    #回贴R252的双端测序结果
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R252_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/R252_2.P.fq.gz -S R252.sam
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = (unset),
            LC_PAPER = "zh_CN.UTF-8",
            LC_ADDRESS = "zh_CN.UTF-8",
            LC_MONETARY = "zh_CN.UTF-8",
            LC_NUMERIC = "zh_CN.UTF-8",
            LC_TELEPHONE = "zh_CN.UTF-8",
            LC_IDENTIFICATION = "zh_CN.UTF-8",
            LC_MEASUREMENT = "zh_CN.UTF-8",
            LC_TIME = "zh_CN.UTF-8",
            LC_NAME = "zh_CN.UTF-8",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    23929511 reads; of these:
      23929511 (100.00%) were paired; of these:
        3455581 (14.44%) aligned concordantly 0 times
        1770888 (7.40%) aligned concordantly exactly 1 time
        18703042 (78.16%) aligned concordantly >1 times
        ----
        3455581 pairs aligned concordantly 0 times; of these:
          132348 (3.83%) aligned discordantly 1 time
        ----
        3323233 pairs aligned 0 times concordantly or discordantly; of these:
          6646466 mates make up the pairs; of these:
            2061887 (31.02%) aligned 0 times
            216206 (3.25%) aligned exactly 1 time
            4368373 (65.72%) aligned >1 times
    95.69% overall alignment rate
    #回贴W251的双端测序结果
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W251_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W251_2.P.fq.gz -S W251.sam
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = (unset),
            LC_PAPER = "zh_CN.UTF-8",
            LC_ADDRESS = "zh_CN.UTF-8",
            LC_MONETARY = "zh_CN.UTF-8",
            LC_NUMERIC = "zh_CN.UTF-8",
            LC_TELEPHONE = "zh_CN.UTF-8",
            LC_IDENTIFICATION = "zh_CN.UTF-8",
            LC_MEASUREMENT = "zh_CN.UTF-8",
            LC_TIME = "zh_CN.UTF-8",
            LC_NAME = "zh_CN.UTF-8",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    25553075 reads; of these:
      25553075 (100.00%) were paired; of these:
        3705332 (14.50%) aligned concordantly 0 times
        2003416 (7.84%) aligned concordantly exactly 1 time
        19844327 (77.66%) aligned concordantly >1 times
        ----
        3705332 pairs aligned concordantly 0 times; of these:
          163553 (4.41%) aligned discordantly 1 time
        ----
        3541779 pairs aligned 0 times concordantly or discordantly; of these:
          7083558 mates make up the pairs; of these:
            2021254 (28.53%) aligned 0 times
            226959 (3.20%) aligned exactly 1 time
            4835345 (68.26%) aligned >1 times
    96.04% overall alignment rate
    #回贴W252的双端测序结果
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bowtie2 -x Trinity.fasta -1 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W252_1.P.fq.gz -2 /home/yeyt/biodata/NH160034/NH160034/cleandata/assembly/W252_2.P.fq.gz -S W252.sam
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = (unset),
            LC_PAPER = "zh_CN.UTF-8",
            LC_ADDRESS = "zh_CN.UTF-8",
            LC_MONETARY = "zh_CN.UTF-8",
            LC_NUMERIC = "zh_CN.UTF-8",
            LC_TELEPHONE = "zh_CN.UTF-8",
            LC_IDENTIFICATION = "zh_CN.UTF-8",
            LC_MEASUREMENT = "zh_CN.UTF-8",
            LC_TIME = "zh_CN.UTF-8",
            LC_NAME = "zh_CN.UTF-8",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    24577100 reads; of these:
      24577100 (100.00%) were paired; of these:
        3173490 (12.91%) aligned concordantly 0 times
        1898984 (7.73%) aligned concordantly exactly 1 time
        19504626 (79.36%) aligned concordantly >1 times
        ----
        3173490 pairs aligned concordantly 0 times; of these:
          112017 (3.53%) aligned discordantly 1 time
        ----
        3061473 pairs aligned 0 times concordantly or discordantly; of these:
          6122946 mates make up the pairs; of these:
            2060673 (33.65%) aligned 0 times
            226885 (3.71%) aligned exactly 1 time
            3835388 (62.64%) aligned >1 times
    95.81% overall alignment rate
    
    这个过程比较消耗时间,我们于此同时做个简单质量控制报告
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ $TRINITY_HOME/util/TrinityStats.pl Trinity.fasta > Trinitystats.log 
    #输出到Trinitystats.log文件
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ cat Trinitystats.log 
    
    
    ################################
    ## Counts of transcripts, etc.
    ################################
    Total trinity 'genes':  110851
    Total trinity transcripts:  220498
    Percent GC: 42.98
    
    ########################################
    Stats based on ALL transcript contigs:
    ########################################
    
       Contig N10: 4369
       Contig N20: 3291
       Contig N30: 2640
       Contig N40: 2183
       Contig N50: 1802
    
       Median contig length: 542
       Average contig: 1008.13
       Total assembled bases: 222289920
    
    
    #####################################################
    ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
    #####################################################
    
       Contig N10: 3997
       Contig N20: 2867
       Contig N30: 2195
       Contig N40: 1663
       Contig N50: 1157
    
       Median contig length: 364
       Average contig: 686.86
       Total assembled bases: 76139520
    
    
    解释一下上面的结果。

    首先做一个概括 拼接得到多少个基因,得到多少个转录本
    然后平均的GC含量是多少
    接下来做一个两个工作
    一个是基于所有转录本的contig统计
    一个是基于所有基因的统计
    N50代表的是

    接下来我们将把得到的sam结果转化成bam结果并进行排序以提供后期的分析文件

    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ls *sam | grep '25' |xargs -I [] echo 'samtools view -bS [] | samtools sort -o [].sorted.bam ' > samtoolssort.sh
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ cat samtoolssort.sh 
    samtools view -bS B251.sam | samtools sort -o B251.sam.sorted.bam 
    samtools view -bS B252.sam | samtools sort -o B252.sam.sorted.bam 
    samtools view -bS R251sam | samtools sort -o R251sam.sorted.bam 
    samtools view -bS R252.sam | samtools sort -o R252.sam.sorted.bam 
    samtools view -bS W251.sam | samtools sort -o W251.sam.sorted.bam 
    samtools view -bS W252.sam | samtools sort -o W252.sam.sorted.bam
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bash samtoolssort.sh 
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ bash samtoolssort.sh                                    [bam_sort_core] merging from 41 files...
    [bam_sort_core] merging from 36 files...
    [bam_sort_core] merging from 36 files...
    [bam_sort_core] merging from 35 files...
    [bam_sort_core] merging from 38 files...
    [bam_sort_core] merging from 36 files...
    yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly/trinity_out_dir$ ll | sort -nk 7
    total 234179528
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 left.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:07 right.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:08 both.fa.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:19 .jellyfish_count.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:24 .jellyfish_dump.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 01:26 .jellyfish_histo.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 .iworm_renamed.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 03:32 inchworm.K25.L25.DS.fa.finished
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 partitioned_reads.files.list.ok
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 14 10:39 recursive_trinity.cmds.ok
    -rw-rw-r-- 1 yeyt yeyt           9 Sep 14 01:08 both.fa.read_count
    -rw-rw-r-- 1 yeyt yeyt          10 Sep 14 01:59 inchworm.kmer_count
    -rw-rw-r-- 1 yeyt yeyt        2757 Sep 14 10:31 pipeliner.3855.cmds
    -rw-rw-r-- 1 yeyt yeyt       22843 Sep 14 01:26 jellyfish.kmers.fa.histo
    -rw-rw-r-- 1 yeyt yeyt    13366802 Sep 14 10:39 partitioned_reads.files.list
    -rw-rw-r-- 1 yeyt yeyt    47753878 Sep 14 10:39 recursive_trinity.cmds
    -rw-rw-r-- 1 yeyt yeyt   493022300 Sep 14 03:27 inchworm.K25.L25.DS.fa
    -rw-rw-r-- 1 yeyt yeyt 11602365066 Sep 14 01:08 both.fa
    -rw-rw-r-- 1 yeyt yeyt 26501596675 Sep 14 01:24 jellyfish.kmers.fa
    -rw-rw-r-- 1 yeyt yeyt 49568938526 Sep 14 06:38 scaffolding_entries.sam
    drwxrwxr-x 2 yeyt yeyt        4096 Sep 14 10:33 chrysalis/
    drwxrwxr-x 3 yeyt yeyt        4096 Sep 14 01:03 insilico_read_normalization/
    drwxrwxr-x 4 yeyt yeyt        4096 Sep 14 10:39 read_partitions/
    -rw-rw-r-- 1 yeyt yeyt           0 Sep 15 20:51 align_stats.txt
    -rw-rw-r-- 1 yeyt yeyt          62 Sep 15 20:51 bowtie2.bam
    -rw-rw-r-- 1 yeyt yeyt         651 Sep 15 07:03 Trinity.timing
    -rw-rw-r-- 1 yeyt yeyt    10213332 Sep 15 07:03 Trinity.fasta.gene_trans_map
    -rw-rw-r-- 1 yeyt yeyt    47753878 Sep 15 07:03 recursive_trinity.cmds.completed
    -rw-rw-r-- 1 yeyt yeyt   244740565 Sep 15 07:03 Trinity.fasta
    drwxrwxr-x 3 yeyt yeyt        4096 Sep 15 18:42 ../
    -rw-rw-r-- 1 yeyt yeyt         821 Sep 23 15:17 Trinitystats.log
    -rw-rw-r-- 1 yeyt yeyt     1984490 Sep 23 13:50 Trinity.fasta.3.bt2
    -rw-rw-r-- 1 yeyt yeyt    55572480 Sep 23 13:50 Trinity.fasta.4.bt2
    -rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:03 Trinity.fasta.2.bt2
    -rw-rw-r-- 1 yeyt yeyt    55572488 Sep 23 14:16 Trinity.fasta.rev.2.bt2
    -rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:03 Trinity.fasta.1.bt2
    -rw-rw-r-- 1 yeyt yeyt   103828770 Sep 23 14:16 Trinity.fasta.rev.1.bt2
    -rw-rw-r-- 1 yeyt yeyt         400 Sep 24 13:22 samtoolssort.sh
    -rw-rw-r-- 1 yeyt yeyt  3049959975 Sep 24 15:37 R252.sam.sorted.bam
    -rw-rw-r-- 1 yeyt yeyt  3181086895 Sep 24 16:44 W252.sam.sorted.bam
    -rw-rw-r-- 1 yeyt yeyt  3192193677 Sep 24 15:06 R251.sam.sorted.bam
    -rw-rw-r-- 1 yeyt yeyt  3206939510 Sep 24 14:33 B252.sam.sorted.bam
    -rw-rw-r-- 1 yeyt yeyt  3267705730 Sep 24 16:11 W251.sam.sorted.bam
    -rw-rw-r-- 1 yeyt yeyt  3655386513 Sep 24 14:01 B251.sam.sorted.bam
    -rw-rw-r-- 1 yeyt yeyt 20770276094 Sep 24 01:49 R252.sam
    -rw-rw-r-- 1 yeyt yeyt 21235142607 Sep 24 02:03 B252.sam
    -rw-rw-r-- 1 yeyt yeyt 21293400430 Sep 24 02:07 R251sam
    -rw-rw-r-- 1 yeyt yeyt 21346715631 Sep 24 02:15 W252.sam
    -rw-rw-r-- 1 yeyt yeyt 22197735984 Sep 24 02:29 W251.sam
    -rw-rw-r-- 1 yeyt yeyt 24496840308 Sep 24 03:04 B251.sam
    
    这样我们就得到了6个sort后的bam文件

    采用以下工具

    bam_stat.py

    clipping_profile.py

    inner_distance.py

    read_duplication.py

    read_GC.py

    相关文章

      网友评论

        本文标题:转录组分析实战附录:Trinity 拼接结果质量控制

        本文链接:https://www.haomeiwen.com/subject/sadtnftx.html