美文网首页基因组
金黄葡萄球菌RNA-seq数据分析

金黄葡萄球菌RNA-seq数据分析

作者: Y大宽 | 来源:发表于2018-10-04 23:48 被阅读51次

    注:已经有了fastq.gz格式数据,一个对照,一个突变株,每个处理三个重复。

    1 fastqc进行质量控制+结果解读

    #将所有的数据进行质控,得到zip的压缩文件和html文件
    fastqc -o .  *.fastq.gz
    
    Started analysis of D1_R1.fastq.gz
    Approx 5% complete for D1_R1.fastq.gz
    Approx 10% complete for D1_R1.fastq.gz
    Approx 15% complete for D1_R1.fastq.gz
    Approx 20% complete for D1_R1.fastq.gz
    Approx 25% complete for D1_R1.fastq.gz
    Approx 30% complete for D1_R1.fastq.gz
    Approx 35% complete for D1_R1.fastq.gz
    Approx 40% complete for D1_R1.fastq.gz
    Approx 45% complete for D1_R1.fastq.gz
    Approx 50% complete for D1_R1.fastq.gz
    Approx 55% complete for D1_R1.fastq.gz
    Approx 60% complete for D1_R1.fastq.gz
    Approx 65% complete for D1_R1.fastq.gz
    Approx 70% complete for D1_R1.fastq.gz
    Approx 75% complete for D1_R1.fastq.gz
    Approx 80% complete for D1_R1.fastq.gz
    Approx 85% complete for D1_R1.fastq.gz
    Approx 90% complete for D1_R1.fastq.gz
    Approx 95% complete for D1_R1.fastq.gz
    Analysis complete for D1_R1.fastq.gz
    

    结果


    fastqc result for D1_R1.png

    2 下载金黄葡萄球菌基因组及基因组注释文件

    金黄葡萄球菌基因组下载地址

    3 HISAT2建立索引并序列比对

    3.1建立index

    hisat2-build GCF_000013425.1_ASM1342v1_genomic.fna sa_index
    

    3.2序列比对

     hisat2 -t -x sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R1.fastq.gz -2  /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R2.fastq.gz -S D1_R1.sam
     hisat2 -t -x sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/WT1_R1.fastq.gz -2  /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/WT1_R2.fastq.gz -S WT1_R1.sam
    
    Time loading forward index: 00:00:00
    Time loading reference: 00:00:00
    Multiseed full-index search: 00:09:58
    14200996 reads; of these:
      14200996 (100.00%) were paired; of these:
        14180176 (99.85%) aligned concordantly 0 times
        13590 (0.10%) aligned concordantly exactly 1 time
        7230 (0.05%) aligned concordantly >1 times
        ----
        14180176 pairs aligned concordantly 0 times; of these:
          135 (0.00%) aligned discordantly 1 time
        ----
        14180041 pairs aligned 0 times concordantly or discordantly; of these:
          28360082 mates make up the pairs; of these:
            28212046 (99.48%) aligned 0 times
            146051 (0.51%) aligned exactly 1 time
            1985 (0.01%) aligned >1 times
    0.67% overall alignment rate
    Time searching: 00:09:58
    Overall time: 00:09:58
    
    Time loading forward index: 00:00:00
    Time loading reference: 00:00:00
    Multiseed full-index search: 00:15:25
    15979802 reads; of these:
      15979802 (100.00%) were paired; of these:
        3149851 (19.71%) aligned concordantly 0 times
        12801703 (80.11%) aligned concordantly exactly 1 time
        28248 (0.18%) aligned concordantly >1 times
        ----
        3149851 pairs aligned concordantly 0 times; of these:
          87906 (2.79%) aligned discordantly 1 time
        ----
        3061945 pairs aligned 0 times concordantly or discordantly; of these:
          6123890 mates make up the pairs; of these:
            4423631 (72.24%) aligned 0 times
            1695827 (27.69%) aligned exactly 1 time
            4432 (0.07%) aligned >1 times
    86.16% overall alignment rate
    Time searching: 00:15:34
    Overall time: 00:15:34
    

    这里出现问题了,突变株的比对率太低,不到1%,这是不可能的,怀疑样品污染,然后随机挑选了5条序列blast了下,发现应该是被溶血葡萄球菌污染。

    4 下载溶血葡萄球菌基因组序列

    下载地址

    4.1 建立index文件

    hisat2-build GCF_000009865.1_ASM986v1_genomic.fna haemo_sa_index
    

    4.2 突变组数据比对到溶血葡萄球菌基因组

     hisat2 -t -x haemo_sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/ D1_R1.fastq.gz -2  /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R2.fastq.gz -S D1_R1.sam
    
    #D1_R1
    Time loading forward index: 00:00:00
    Time loading reference: 00:00:00
    Multiseed full-index search: 00:15:01
    14200996 reads; of these:
      14200996 (100.00%) were paired; of these:
        2971894 (20.93%) aligned concordantly 0 times
        11205145 (78.90%) aligned concordantly exactly 1 time
        23957 (0.17%) aligned concordantly >1 times
        ----
        2971894 pairs aligned concordantly 0 times; of these:
          67246 (2.26%) aligned discordantly 1 time
        ----
        2904648 pairs aligned 0 times concordantly or discordantly; of these:
          5809296 mates make up the pairs; of these:
            4179316 (71.94%) aligned 0 times
            1622752 (27.93%) aligned exactly 1 time
            7228 (0.12%) aligned >1 times
    85.29% overall alignment rate
    Time searching: 00:15:01
    Overall time: 00:15:01
    
    #D2_R2
    Time loading forward index: 00:00:00
    Time loading reference: 00:00:00
    Multiseed full-index search: 00:19:54
    18272739 reads; of these:
      18272739 (100.00%) were paired; of these:
        3984869 (21.81%) aligned concordantly 0 times
        14260046 (78.04%) aligned concordantly exactly 1 time
        27824 (0.15%) aligned concordantly >1 times
        ----
        3984869 pairs aligned concordantly 0 times; of these:
          83138 (2.09%) aligned discordantly 1 time
        ----
        3901731 pairs aligned 0 times concordantly or discordantly; of these:
          7803462 mates make up the pairs; of these:
            5671806 (72.68%) aligned 0 times
            2110960 (27.05%) aligned exactly 1 time
            20696 (0.27%) aligned >1 times
    84.48% overall alignment rate
    Time searching: 00:24:25
    Overall time: 00:24:25
    
    Time loading forward index: 00:00:00
    Time loading reference: 00:00:00
    Multiseed full-index search: 00:18:10
    17122975 reads; of these:
      17122975 (100.00%) were paired; of these:
        3511683 (20.51%) aligned concordantly 0 times
        13593051 (79.38%) aligned concordantly exactly 1 time
        18241 (0.11%) aligned concordantly >1 times
        ----
        3511683 pairs aligned concordantly 0 times; of these:
          74659 (2.13%) aligned discordantly 1 time
        ----
        3437024 pairs aligned 0 times concordantly or discordantly; of these:
          6874048 mates make up the pairs; of these:
            5027355 (73.14%) aligned 0 times
            1838519 (26.75%) aligned exactly 1 time
            8174 (0.12%) aligned >1 times
    85.32% overall alignment rate
    Time searching: 00:18:10
    Overall time: 00:18:10
    

    5 金黄葡萄球菌和溶血葡萄球菌基因组基本数据比较

    金黄葡萄球菌

    Total_Len:  2697861
    Total_Seq_Num : 4
    Total_N_Counts: 0
    Total_LowCase_Counts:   0
    Total_GC_content:   0.33
    Minimum Len:    2300
    Maximum Len:    2685015
    Mean Len:   674,465.25
    Median Len: 1,346,597.5
    N50:    2685015
    

    溶血葡萄球菌

    Total_Len:  2821361
    Total_Seq_Num : 1
    Total_N_Counts: 1
    Total_LowCase_Counts:   0
    Total_GC_content:   0.33
    Minimum Len:    2821361
    Maximum Len:    2821361
    Mean Len:   2,821,361
    Median Len: 2,821,361
    N50:    2821361
    

    6 金黄葡萄球菌和溶血葡萄球菌基因组比对结果分析

    # 710 hits found
    query id        subject id      identity%       alignment length        mismatches      gap opens       q. start        q. end  s. start        s. end  evalue  bit score
    NC_007795.1     NC_007168.1     89.55   14914   1410    111     2304062 2318871 833494  818625  0       18774
    NC_007795.1     NC_007168.1     94.84   7822    241     100     1899018 1906760 1110575 1102838 0       12056
    NC_007795.1     NC_007168.1     95.18   5392    171     58      448540  453908  2544291 2538966 0       8434
    NC_007795.1     NC_007168.1     95.95   5209    140     47      492993  498180  2544133 2538975 0       8384
    NC_007795.1     NC_007168.1     96.04   5156    139     40      492993  498130  879727  884835  0       8331
    NC_007795.1     NC_007168.1     95.95   5161    143     42      448709  453850  879722  884835  0       8312
    NC_007795.1     NC_007168.1     77.9    12405   2498    209     978268  990541  1940909 1928618 0       7492
    NC_007795.1     NC_007168.1     91.12   5045    394     45      529559  534584  2461838 2456829 0       6785
    NC_007795.1     NC_007168.1     86.38   5706    668     76      520058  525712  2471424 2465777 0       6131
    NC_007795.1     NC_007168.1     96.47   3683    90      21      2238680 2242354 885361  881711  0       6045
    NC_007795.1     NC_007168.1     78.25   9179    1738    197     1588304 1597364 1376275 1367237 0       5655
    NC_007795.1     NC_007168.1     96.42   3407    103     10      2122741 2126140 976287  972893  0       5598
    NC_007795.1     NC_007168.1     96.36   3411    105     11      2239223 2242626 976292  972894  0       5594
    NC_007795.1     NC_007168.1     92.35   3895    205     44      450529  454385  2498609 2494770 0       5456
    NC_007795.1     NC_007168.1     96.22   3336    86      22      494809  498137  1104945 1108247 0       5426
    NC_007795.1     NC_007168.1     96.33   3324    83      21      1901348 1904666 2539029 2542318 0       5426
    NC_007795.1     NC_007168.1     96.21   3324    88      19      1901348 1904666 884833  881543  0       5406
    NC_007795.1     NC_007168.1     96.18   3327    89      20      494808  498130  2498609 2495317 0       5406
    NC_007795.1     NC_007168.1     96.1    3336    89      23      450530  453857  1104945 1108247 0       5402
    NC_007795.1     NC_007168.1     95.49   3411    106     28      1901348 1904745 2495319 2498694 0       5402
    NC_007795.1     NC_007168.1     97.14   3184    76      9       2122692 2125867 2538975 2542151 0       5361
    NC_007795.1     NC_007168.1     97.06   3168    79      9       2239193 2242354 1108272 1105113 0       5323
    NC_007795.1     NC_007168.1     97.35   3133    72      6       2239226 2242354 2539026 2542151 0       5315
    NC_007795.1     NC_007168.1     97.17   3149    78      6       494986  498130  973146  976287  0       5312
    NC_007795.1     NC_007168.1     94.5    3472    144     20      450707  454157  973146  976591  0       5310
    NC_007795.1     NC_007168.1     97.32   3131    74      4       2122741 2125867 884835  881711  0       5308
    

    7 话外

    相关文章

      网友评论

        本文标题:金黄葡萄球菌RNA-seq数据分析

        本文链接:https://www.haomeiwen.com/subject/zynxaftx.html