MGCA
Microbial genome component and annotation pipeline
Introduction
The software under designing dedicates to perform the following analysis:
Genomic Component
-
Genomic Island
-
Prophage
-
CRISPR-Cas
- Tandem Repeats
- Interspersed Repeats
- rRNA
- tRNA
- sRNA
Genomic Attributes
Function Annotation
General Annotation
-
SwissProt
-
Pfam
-
GO
-
KEGG
-
Effectors
-
T3SS
-
T4SS
-
Secretory/Membrane/Intracellular Protein
-
Secondary Metabolite Biosynthetic Gene Clusters
-
Virulence/Pathogenicity/Resistance Gene
-
Antibiotic Resistance Genes (ARGs)
-
Pathogen Host Interactions (PHI)
-
Comprehensive Antibiotic Resistance Database (CARD)
- Element Cycle
- CAZyme
- Nitrogen
- Sulfur
- Methane
-
Membrane Transport Protein (TCDB)
Comparative Genomics
NOTICE: It will take a long time to complete the development!
Installation
The software was tested successfully on Windows WSL, Linux x64 platform, and macOS. Because this software relies on a large number of other software, so it is recommended to install with Bioconda.
Step1: Install MGCA
Method 1: use mamba to install MGCA
# Install mamba first
conda install mamba
# Usually specify the latest version of MGCA
mamba create -n mgca mgca=0.0.0
# 上面的命令提示找不到mgca的话,用下面这条来安装
mamba create -n mgca https://anaconda.org/bioconda/mgca/0.0.0/download/noarch/mgca-0.0.0-pl5321hdfd78af_0.tar.bz2
Step2: Setup database (Users should execute this after the first installation of mgca)
conda activate mgca
setupDB --all
conda deactivate
Notice: there is a little bug, users can edit the "setupDB" file located at the mgca installation path to resolve the problem. Just remove the lines after line no. 83.
Required dependencies
emboss
islandpath
opfi
Perl & the modules
phispy 4.2.21
R & the packages
wget
In the future:
#- gtdbtk
#- bakta (include trnascan-se infernal piler-cr)
#- repeatmasker (include trf)
#- mummer4
#- artemis (include openjdk)
#- saspector (include trf progressivemauve prokka)
#- lastz
#- kakscalculator2
#- interproscan (include emboss openjdk)
#- eggnog-mapper (include wget)
Usage
Print the help messages:
mgca --help
General usage:
mgca [modules] [options]
Modules:
[--PI] Calculate statistics of protein properties and print pI of all protein sequences
[--IS] Predict genomic island from GenBank files
[--PROPHAGE] Predict prophage sequences from GenBank files
[--CRISPR] Finding CRISPR-Cas systems in genomics or metagenomics datasets
Examples
Example 1: Calculate statistics of protein properties and print pI of all protein sequences
mgca --PI --AAsPath <PATH> --aa_suffix <.faa>
Example 2: Predict genomic island from GenBank files
mgca --IS --gbkPath <PATH> --gbk_suffix <.gbk>
Example 3: Predict prophage sequences from GenBank files
mgca --PROPHAGE --gbkPath <PATH> --gbk_suffix <.gbk> --phmms <Path of pVOG.hmm> --phage_genes <1> --min_contig_size <5000> --threads <6>
Example 4: Finding CRISPR-Cas systems in genomics or metagenomics datasets
mgca --CRISPR --scafPath <PATH> --scaf_suffix <.fa> --casDBpath <db path> --threads <6>
OUTPUT
PI
Results/PI/*.pepstats: Peptide statistics for each protein sequence organized by the genome.
Results/PI/*.pI: Protein isoelectric point and its frequency.
Results/PI/*.pI.tiff: A plot drawing 'Relative frequency' vs. 'isoelectric point'.
IS
Results/IS/All_island.list: A list file containing genomic island information.
Results/IS/All_island.txt: A file contains information and sequence of genes in the genomic island.
PROPHAGE
Results/PROPHAGE/*_prophage: Result for each genome.
Results/PROPHAGE/All.prophages.txt: The summary results (for all genomes) include information of prophage on the host genome.
Results/PROPHAGE/All.prophages.seq: The summary results (for all genomes) include information of prophage genes and sequences.
CRISPR
Results/CRISPR/*_intially: Results obtained by permissive BLAST parameters (In most cases, it can be ignored).
Results/CRISPR/*_filtered: The results obtained after *_intially
quality control (The final result).
Results/CRISPR/*_filtered/*.csv: The file contains information of CRISPR array
.
Results/CRISPR/*_filtered/*.png: The visualizations of all predicted CRISPR array
, as shown below:
License
MGCA is free software, licensed under GPLv3.
Feedback and Issues
Please report any issues to the issues page or email us at liaochenlanruo@webmail.hzau.edu.cn.
Citation
If you use this software, please cite: Hualin Liu. MGCA: microbial genome component and annotation pipeline. Available at GitHub https://github.com/liaochenlanruo/mgca
Updates
V0.0.0
The MGCA was born.
网友评论