注:已经有了fastq.gz格式数据,一个对照,一个突变株,每个处理三个重复。
1 fastqc进行质量控制+结果解读
#将所有的数据进行质控,得到zip的压缩文件和html文件
fastqc -o . *.fastq.gz
Started analysis of D1_R1.fastq.gz
Approx 5% complete for D1_R1.fastq.gz
Approx 10% complete for D1_R1.fastq.gz
Approx 15% complete for D1_R1.fastq.gz
Approx 20% complete for D1_R1.fastq.gz
Approx 25% complete for D1_R1.fastq.gz
Approx 30% complete for D1_R1.fastq.gz
Approx 35% complete for D1_R1.fastq.gz
Approx 40% complete for D1_R1.fastq.gz
Approx 45% complete for D1_R1.fastq.gz
Approx 50% complete for D1_R1.fastq.gz
Approx 55% complete for D1_R1.fastq.gz
Approx 60% complete for D1_R1.fastq.gz
Approx 65% complete for D1_R1.fastq.gz
Approx 70% complete for D1_R1.fastq.gz
Approx 75% complete for D1_R1.fastq.gz
Approx 80% complete for D1_R1.fastq.gz
Approx 85% complete for D1_R1.fastq.gz
Approx 90% complete for D1_R1.fastq.gz
Approx 95% complete for D1_R1.fastq.gz
Analysis complete for D1_R1.fastq.gz
结果
fastqc result for D1_R1.png
2 下载金黄葡萄球菌基因组及基因组注释文件
3 HISAT2建立索引并序列比对
3.1建立index
hisat2-build GCF_000013425.1_ASM1342v1_genomic.fna sa_index
3.2序列比对
hisat2 -t -x sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R1.fastq.gz -2 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R2.fastq.gz -S D1_R1.sam
hisat2 -t -x sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/WT1_R1.fastq.gz -2 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/WT1_R2.fastq.gz -S WT1_R1.sam
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:09:58
14200996 reads; of these:
14200996 (100.00%) were paired; of these:
14180176 (99.85%) aligned concordantly 0 times
13590 (0.10%) aligned concordantly exactly 1 time
7230 (0.05%) aligned concordantly >1 times
----
14180176 pairs aligned concordantly 0 times; of these:
135 (0.00%) aligned discordantly 1 time
----
14180041 pairs aligned 0 times concordantly or discordantly; of these:
28360082 mates make up the pairs; of these:
28212046 (99.48%) aligned 0 times
146051 (0.51%) aligned exactly 1 time
1985 (0.01%) aligned >1 times
0.67% overall alignment rate
Time searching: 00:09:58
Overall time: 00:09:58
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:15:25
15979802 reads; of these:
15979802 (100.00%) were paired; of these:
3149851 (19.71%) aligned concordantly 0 times
12801703 (80.11%) aligned concordantly exactly 1 time
28248 (0.18%) aligned concordantly >1 times
----
3149851 pairs aligned concordantly 0 times; of these:
87906 (2.79%) aligned discordantly 1 time
----
3061945 pairs aligned 0 times concordantly or discordantly; of these:
6123890 mates make up the pairs; of these:
4423631 (72.24%) aligned 0 times
1695827 (27.69%) aligned exactly 1 time
4432 (0.07%) aligned >1 times
86.16% overall alignment rate
Time searching: 00:15:34
Overall time: 00:15:34
这里出现问题了,突变株的比对率太低,不到1%,这是不可能的,怀疑样品污染,然后随机挑选了5条序列blast了下,发现应该是被溶血葡萄球菌污染。
4 下载溶血葡萄球菌基因组序列
4.1 建立index文件
hisat2-build GCF_000009865.1_ASM986v1_genomic.fna haemo_sa_index
4.2 突变组数据比对到溶血葡萄球菌基因组
hisat2 -t -x haemo_sa_index -1 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/ D1_R1.fastq.gz -2 /mnt/f/lbhu/TPL201807308/Results/01_Rawdata/D1_R2.fastq.gz -S D1_R1.sam
#D1_R1
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:15:01
14200996 reads; of these:
14200996 (100.00%) were paired; of these:
2971894 (20.93%) aligned concordantly 0 times
11205145 (78.90%) aligned concordantly exactly 1 time
23957 (0.17%) aligned concordantly >1 times
----
2971894 pairs aligned concordantly 0 times; of these:
67246 (2.26%) aligned discordantly 1 time
----
2904648 pairs aligned 0 times concordantly or discordantly; of these:
5809296 mates make up the pairs; of these:
4179316 (71.94%) aligned 0 times
1622752 (27.93%) aligned exactly 1 time
7228 (0.12%) aligned >1 times
85.29% overall alignment rate
Time searching: 00:15:01
Overall time: 00:15:01
#D2_R2
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:19:54
18272739 reads; of these:
18272739 (100.00%) were paired; of these:
3984869 (21.81%) aligned concordantly 0 times
14260046 (78.04%) aligned concordantly exactly 1 time
27824 (0.15%) aligned concordantly >1 times
----
3984869 pairs aligned concordantly 0 times; of these:
83138 (2.09%) aligned discordantly 1 time
----
3901731 pairs aligned 0 times concordantly or discordantly; of these:
7803462 mates make up the pairs; of these:
5671806 (72.68%) aligned 0 times
2110960 (27.05%) aligned exactly 1 time
20696 (0.27%) aligned >1 times
84.48% overall alignment rate
Time searching: 00:24:25
Overall time: 00:24:25
Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:18:10
17122975 reads; of these:
17122975 (100.00%) were paired; of these:
3511683 (20.51%) aligned concordantly 0 times
13593051 (79.38%) aligned concordantly exactly 1 time
18241 (0.11%) aligned concordantly >1 times
----
3511683 pairs aligned concordantly 0 times; of these:
74659 (2.13%) aligned discordantly 1 time
----
3437024 pairs aligned 0 times concordantly or discordantly; of these:
6874048 mates make up the pairs; of these:
5027355 (73.14%) aligned 0 times
1838519 (26.75%) aligned exactly 1 time
8174 (0.12%) aligned >1 times
85.32% overall alignment rate
Time searching: 00:18:10
Overall time: 00:18:10
5 金黄葡萄球菌和溶血葡萄球菌基因组基本数据比较
金黄葡萄球菌
Total_Len: 2697861
Total_Seq_Num : 4
Total_N_Counts: 0
Total_LowCase_Counts: 0
Total_GC_content: 0.33
Minimum Len: 2300
Maximum Len: 2685015
Mean Len: 674,465.25
Median Len: 1,346,597.5
N50: 2685015
溶血葡萄球菌
Total_Len: 2821361
Total_Seq_Num : 1
Total_N_Counts: 1
Total_LowCase_Counts: 0
Total_GC_content: 0.33
Minimum Len: 2821361
Maximum Len: 2821361
Mean Len: 2,821,361
Median Len: 2,821,361
N50: 2821361
6 金黄葡萄球菌和溶血葡萄球菌基因组比对结果分析
# 710 hits found
query id subject id identity% alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score
NC_007795.1 NC_007168.1 89.55 14914 1410 111 2304062 2318871 833494 818625 0 18774
NC_007795.1 NC_007168.1 94.84 7822 241 100 1899018 1906760 1110575 1102838 0 12056
NC_007795.1 NC_007168.1 95.18 5392 171 58 448540 453908 2544291 2538966 0 8434
NC_007795.1 NC_007168.1 95.95 5209 140 47 492993 498180 2544133 2538975 0 8384
NC_007795.1 NC_007168.1 96.04 5156 139 40 492993 498130 879727 884835 0 8331
NC_007795.1 NC_007168.1 95.95 5161 143 42 448709 453850 879722 884835 0 8312
NC_007795.1 NC_007168.1 77.9 12405 2498 209 978268 990541 1940909 1928618 0 7492
NC_007795.1 NC_007168.1 91.12 5045 394 45 529559 534584 2461838 2456829 0 6785
NC_007795.1 NC_007168.1 86.38 5706 668 76 520058 525712 2471424 2465777 0 6131
NC_007795.1 NC_007168.1 96.47 3683 90 21 2238680 2242354 885361 881711 0 6045
NC_007795.1 NC_007168.1 78.25 9179 1738 197 1588304 1597364 1376275 1367237 0 5655
NC_007795.1 NC_007168.1 96.42 3407 103 10 2122741 2126140 976287 972893 0 5598
NC_007795.1 NC_007168.1 96.36 3411 105 11 2239223 2242626 976292 972894 0 5594
NC_007795.1 NC_007168.1 92.35 3895 205 44 450529 454385 2498609 2494770 0 5456
NC_007795.1 NC_007168.1 96.22 3336 86 22 494809 498137 1104945 1108247 0 5426
NC_007795.1 NC_007168.1 96.33 3324 83 21 1901348 1904666 2539029 2542318 0 5426
NC_007795.1 NC_007168.1 96.21 3324 88 19 1901348 1904666 884833 881543 0 5406
NC_007795.1 NC_007168.1 96.18 3327 89 20 494808 498130 2498609 2495317 0 5406
NC_007795.1 NC_007168.1 96.1 3336 89 23 450530 453857 1104945 1108247 0 5402
NC_007795.1 NC_007168.1 95.49 3411 106 28 1901348 1904745 2495319 2498694 0 5402
NC_007795.1 NC_007168.1 97.14 3184 76 9 2122692 2125867 2538975 2542151 0 5361
NC_007795.1 NC_007168.1 97.06 3168 79 9 2239193 2242354 1108272 1105113 0 5323
NC_007795.1 NC_007168.1 97.35 3133 72 6 2239226 2242354 2539026 2542151 0 5315
NC_007795.1 NC_007168.1 97.17 3149 78 6 494986 498130 973146 976287 0 5312
NC_007795.1 NC_007168.1 94.5 3472 144 20 450707 454157 973146 976591 0 5310
NC_007795.1 NC_007168.1 97.32 3131 74 4 2122741 2125867 884835 881711 0 5308
网友评论