分析宏基因组往往是一件很漫长的流程,大神Pierre MARTIN编写了[metagwgs](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs)工作流程可以使用集群对metagenomic data(Illumina HiSeq3000 or NovaSeq, paired, 2*150bp)进行分析,该流程涵盖了宏基因组分析所有必要步骤,还包含assembly, taxonomic annotation以及functional annotation of predicted genes。
输入raw data格式为.fastq或者.fastq.gz,该工作流程一共包含了8个步骤:
-
01_clean_qc
(can ke skipped)- trims 接头序列和 deletes低质量的reads,相关软件: (Cutadapt, Sickle)
- 抑制宿主基因干扰,相关软件: (BWA + Samtools + Bedtools)
- 质量控制,相关软件: (FastQC)
- cleaned reads的系统分类,相关软件: (Kaiju MEM + kronaTools + Generate_barplot_kaiju.py + merge_kaiju_results.py)
-
02_assembly
-
03_filtering
(can be skipped)- filters contigs with low CPM value (Filter_contig_per_cpm.py + metaQUAST)
-
04_structural_annot
- makes a structural annotation of genes (Prokka + Rename_contigs_and_genes.py)
-
05_alignment
-
06_func_annot
- makes a sample and global clustering of genes (cd-hit-est + cd_hit_produce_table_clstr.py)
- quantifies reads that align with the genes (featureCounts + Quantification_clusters.py)
- makes a functional annotation of genes and a quantification of reads by function (eggNOG-mapper + best_bitscore_diamond.py + merge_abundance_and_functional_annotations.py + quantification_by_functional_annotation.py)
-
07_taxo_affi
- taxonomically affiliates the genes (Samtools + aln2taxaffi.py)
- taxonomically affiliates the contigs (Samtools + aln2taxaffi.py)
- counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level (Samtools + merge_contig_quantif_perlineage.py + quantification_by_contig_lineage.py)
-
08_binning
from nf-core/mag 1.0.0- makes binning of contigs (MetaBAT2)
- assesses bins (BUSCO + metaQUAST + summary_busco.py and combine_tables.py from nf-core/mag)
- taxonomically affiliates the bins (BAT)
A report html file is generated at the end of the workflow with MultiQC.
The pipeline is built using Nextflow, a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Three Singularity containers are available making installation trivial and results highly reproducible.
网友评论