Meta genomics Covid review

作者: 11的雾 | 来源:发表于2022-02-20 23:23 被阅读0次

Meta genomics Covid review
Guideline 5.1.1 - Legal - Privac
【模板】IL-6单抗降低COVID-19病死率。这项meta分析
资源帖：Population genetics and geno
美国生物信息工程师工资
架构师拥抱敏捷(2019.11.20)
【转】第二代测序数据处理工具软件汇总
前言
2020-10-23COVID fighters tie the
R语言meta分析（5）累积Meta分析

Metagenomic analysis reveals oropharyngeal microbiota alterations in patients with COVID-19

哈工大省医

宏基因组分析揭示了新冠病人口腔微生物的改变。

31 covid-19，29 flu，28 normal 三组对照。

宏基因组数据先区分出metagenomics，和COVID-19，reads，然后mapping to covid-19 reference.

组装出whole genome 看看snp，哪些位点是否有变异？

virus discovery？

看鉴别出是哪种virus

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7441049/

鉴定出的COVID-19 样本：定的标准是什么？

讨论：

meta对疾病的恶化，和预后的影响。

[]metaquast 提供reference

[]prokka参数 --kingdom 中Bacteria 和Viruses的区别

########################################################

用contig跟coronavirus reference做blast，

找到所有的coronavirus的contig，并画图：

cd /USB/metagenomics/analysis/my_project/blastn_coronavirus_result

python gather_blastn_contig.py /USB/metagenomics/analysis/my_project/blastn_coronavirus_result /USB/metagenomics/analysis/my_project/megahit_result >all_sample.coronavirus.contigs.fa

下一步用mega或什么对齐软件画个树看看。

保留这个结果，并且验证跟kraken2的结果的一致性：

#######################################################

braken build database

/USB/tools/Bracken-2.6.2/bracken-build -d /USB/meta_genomics_pipeline/database/kraken2_database -t 32 -l 250

#######################################################

troubleshooting for kraken2 results

Question：kraken2的结果中有48% unclassified，并且没有找到virus相关的任何比例。

Loading database information... done.

275881 sequences (55.16 Mbp) processed in 10.696s (1547.6 Kseq/m, 309.44 Mbp/m).

142595 sequences classified (51.69%)

133286 sequences unclassified (48.31%)

解决：在kraken2构建完成的database中seqid2taxid.map文件里面没有找到virus的序列，故此推断在构建时没有把virus包含进去，因为重新使用kraken2-build 构建数据库。Note：重新构建数据库前必须要删掉原来的seqid2taxid.map，taxo.k2d，hash.k2d，opts.k2d四个文件，

第二次：after rebuild the database，重新跑kraken2:

$ /USB/meta_genomics_pipeline/bin/kraken2 --db /USB/meta_genomics_pipeline/database/kraken2_database --threads 50 --report /USB/metagenomics/analysis/my_project/kraken2_result/houmaohu.kraken2.report --use-names --output /USB/metagenomics/analysis/my_project/kraken2_result/houmaohu.kraken2.output /USB/metagenomics/analysis/my_project/kneaddata_result/houmaohu.kneaddata.fastq

Loading database information... done.

275881 sequences (55.16 Mbp) processed in 3.922s (4221.0 Kseq/m, 843.98 Mbp/m).

174591 sequences classified (63.28%)

101290 sequences unclassified (36.72%)

finally we found Covid-19 in kraken2 result. yeah. but still 36.72% of sequences are unclassified. we doubt that they are human related sequencing .

从第二次的结果中能够找到virus结果了，但是仍然有36%是unclassified，验证是否是human genome，发现不是。

验证方法如下：

使用bowtie2 将用于kraken2的fastq跟hg19 reference比对，发现比对率为0.说明该fastq已经不含有human genome了。那么怀疑是其它fungi，plasmid，等数据库，为了验证，

需要下载其他数据库，并重新使用kraken2-build 构建index。

第三次更新数据库之后

$ /USB/meta_genomics_pipeline/bin/kraken2 --db /USB/meta_genomics_pipeline/database/kraken2_database --threads 32 --report /USB/metagenomics/analysis/my_project/kraken2_result/houmaohu.kraken2.report --use-names --output /USB/metagenomics/analysis/my_project/kraken2_result/houmaohu.kraken2.output /USB/metagenomics/analysis/my_project/kneaddata_result/houmaohu.kneaddata.fastq 2>/USB/metagenomics/analysis/my_project/kraken2_result/houmaohu.kraken2.log

Loading database information... done.

275881 sequences (55.16 Mbp) processed in 9.124s (1814.2 Kseq/m, 362.74 Mbp/m).

174783 sequences classified (63.35%)

101098 sequences unclassified (36.65%)

still 36.65% unclassified

那么这些序列是什么呢？输出来看看吧，重新用kraken2 输出unclassified reads。在ncbi上

发现还是virus和一些bacteria，那下载nt库试试看吧。

kraken2-build --download-library nt --db ./

######################################################################

kraken2 report explaination: here

######################################################################

'a' : all taxonomic levels

'k' : kingdoms

'p' : phyla only

'c' : classes only

'o' : orders only

'f' : families only

'g' : genera only

's' : species only

######################################################################

记录cornonavirus的相关信息：

单独的cov-reference放在：/USB/meta_genomics_pipeline/database/SARS-CoV-2-reference

NCBI名称：

>NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome

长度：29903bp

######################################################################

添加contig注释功能：目的：鉴定组装出来的contig都是什么物种。目标：能够找到target物种。用于后续的分析。

思路：基于kraken2使用的bacteria，virus，fungi，archaea，等6个数据库的fasta文件，重新构建blastn 所需index，然后使用blastn 将sample组装好的contigalign到该数据库上。使用in-house脚本统计并整理每个contig的物种信息，并汇总。

1）构建数据库：

cd /USB/meta_genomics_pipeline/database/blastn_dbs

cat ../kraken2_database/library/*/library.fna > my_kraken_cat6dbs_20210726.fna

makeblastdb -parse_seqids -dbtype nucl -in my_kraken_cat6dbs_20210726.fna

2）mapping 添加到main pipeline中，做一个测试：

blastn -db /USB/meta_genomics_pipeline/database/blastn_dbs/my_kraken_cat6dbs_20210726.fna -query /USB/metagenomics/analysis/my_project/megahit_result/houmaohu/houmaohu.megahit.final.contigs.fa -out ./sample01.corona.out -outfmt 7 -num_threads 30 -subject_besthit

3）写脚本统计每条contig都是什么物种（besthit）

输入文件1：contig文件，

输入文件2：blast out 文件，cat 所有blast out文件到一个文件中，

输入文件3：blast 时用到的reference文件中就有id，叫做seqid2taxid.6abfpuv.map

输出文件1：contig fasta

思路二：

无需cat 6个database，直接分别比对，这样节省时间，提高运算速度，只是一个组装文件会得到6个比对结果，再用脚本整理到一起

1）那么对6个database分别构建blastn index

2）分别比对6次

3）统计结果

ls /USB/meta_genomics_pipeline/database/kraken2_database/library/*/library.fna |while read line;do name=`ls $line|awk -F '/' '{print$(NF-1)}'`;echo "makeblastdb -parse_seqids -dbtype nucl -in $line -out ./$name ";done |sh

有个数据库出错了，检查blastn的报错，并重新构建blast index，

######################################################################

遗留问题：

[]下一步用mega或什么对齐软件画个树看看。

[x]rebuild kraken2 database (running)

[x]rerun kraken2 to check ratio after 2 was done

[x]rerun blastn when finished build reference index.

[x]kraken2 结果画图，不画了，krona的图就够了。

[x]处理结果统计脚本，完成，将fasta文件用于galaxy展示。

[x]检查blast index 的错误，重跑blast

######################################################################

使用krona画物种分类图：

注意：kraken2.output中第三列为taxa_id,so do not using --use-names when using kraken2.

le ./houmaohu.kraken2.output |cut -f 2,3 >houmaohu.kraken.krona

ktImportTaxonomy houmaohu.kraken.krona -o houmaohu.taxonomy.krona.html

[ WARNING ] Too many query IDs to store in chart; storing supplemental files in 'taxonomy.krona.html.files'.

会报一个warning，但不影响html上的图

done

######################################################################

验证kneaddata之后的reads是否还有hg19.genome.

$ bowtie2 -p 20 -x /USB/meta_genomics_pipeline/database/kneaddata_database/hg37dec_v0.1 -U ../kneaddata_result/houmaohu.kneaddata.fastq -S houmaohu_r1_bowtie.fw.sam --nofw --local

275881 reads; of these:

275881 (100.00%) were unpaired; of these:

275870 (100.00%) aligned 0 times

1 (0.00%) aligned exactly 1 time

10 (0.00%) aligned >1 times

0.00% overall alignment rate

(prokka_env) ceid 15:08:09 /USB/metagenomics/analysis/my_project/test_kneaddata_2hg19

$ bowtie2 -p 20 -x /USB/meta_genomics_pipeline/database/kneaddata_database/hg37dec_v0.1 -U /USB/metagenomics/analysis/my_project/fastqc_result/houmaohu.raw.fastq.gz -S houmaohu_raw_bowtie.fw.sam --nofw --local

488767 reads; of these:

488767 (100.00%) were unpaired; of these:

474782 (97.14%) aligned 0 times

3171 (0.65%) aligned exactly 1 time

10814 (2.21%) aligned >1 times

2.86% overall alignment rate

1）原始fastq中就只有2.8%的human reads，

2）kneaddata后没有human reads，证明kneaddata会去除humanreads

3）那么问题来了，kraken2记过中37% no hit是什么？是什么？是什么？

############################################################################

看VIP工具文章，写思路，出结果。

############################################################################

data from paper:

https://www.nature.com/articles/srep23774/tables/2

补充数据分析，新增PE数据，适配流程。

seqkit fx2tab -lg UniVec_Core_library.fixed.fna |cut -f 1,4,5|head -1

python /USB/meta_genomics_pipeline/bin/blastout2contig_annotation.py -c /USB/metagenomics_analysis/analysis/my_project/megahit_result/houmaohu/houmaohu.megahit.final.contigs.fa -b /USB/metagenomics_analysis/analysis/my_project/blastn_result/houmaohu/houmaohu.cat.all.blast.out -m /USB/meta_genomics_pipeline/database/blastn_dbs/seqid2taxid.6abfpuv.map -o /USB/metagenomics_analysis/analysis/my_project/blastn_result/houmaohu.annotated.final.contigs.fa

contig_name annotation ref_name ref_gc contig_len contig_gc Genome_coverage

k141_259 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome 29903 37.97 7433 39.63 24.857%

个性化分析：

提取所有样本中的target 物种的组装结果序列，并检查N50，N90等，看看全长多少，能不能画进化树之类的，或者后续snp分析。

################################################

method for phylogeny：

conda create -n phylo blast mafft raxml iqtree bedtools

conda activate phylo

mafft --auto all.coronavirus.with.ref.fa > all.coronavirus.with.ref.out

raxmlHPC -s all.coronavirus.with.ref.out -m GTRGAMMA -n coronavirus_tree -p 12345

# or

iqtree -s all.coronavirus.with.ref.out

cat all.coronavirus.with.ref.out.treefile

paste treefile to https://itol.embl.de/upload.cgi

图不直观，应该用gene,

用contig 注释出gene，在用gene作图。

补充：

1）Genome Coverage (%) ，contig长度与reference长度的比例，在已有的脚本上修改。

2）Total reads （过滤kneaddata后的fastq的reads，）

3）Number of reads (%) 统计比对上每个contig后的reads，咋统计？

评估每种virus的结果。

参数修正：

megahit --min-len-contig 1000

blastn -evalue 10

reference:

https://genomics.sschmeier.com/ngs-orthology/index.html

Meta genomics Covid review
Metagenomic analysis reveals oropharyngeal microbiota alt...
Guideline 5.1.1 - Legal - Privac
We found in our review that your app related to the COVID...
【模板】IL-6单抗降低COVID-19病死率。这项meta分析
关键词：COVID-19；IL-6单抗；meta分析 IL-6抑制剂治疗重症COVID-19的研究非常多，不同研究...
资源帖：Population genetics and geno
Population genetics and genomics in Rhttps://grunwaldlab....
美国生物信息工程师工资
Genomics Salary Survey and WorkforceInsights Report 2018:...
架构师拥抱敏捷(2019.11.20)
Sprint review : sprint 评审代码review 项目完成进度review review一定是...
【转】第二代测序数据处理工具软件汇总
Integrated solutions CLCbio Genomics Workbench – de novo ...
前言
Fundamentals of Bioinformatics and Genomics ?TO-DO?? 生物信息...
2020-10-23COVID fighters tie the
COVID fighters tie the knot COVID战士喜结连理 China Daily | Upd...
R语言meta分析（5）累积Meta分析
R语言meta分析(1)meta包 R语言meta分析(2)单个率的Meta分析 R语言meta分析(3)亚组分析...