美文网首页泛基因组
细菌基因组分析软件--Bactopia

细菌基因组分析软件--Bactopia

作者: yilunanxia | 来源:发表于2020-08-05 10:34 被阅读0次

    一、软件介绍:

    1、文章信息:

    Petit III RA, Read TD, Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5 (2020), https://doi.org/10.1128/mSystems.00190-20.

    2、软件相关介绍:

    https://github.com/bactopia/bactopia

    3、软件工作流程:

    软件分析流程

    4、主要功能

    我觉得最大的特点是傻瓜,一步到位。以前的分析往往需要多步多软件进行。用完一个再用另外一个。比如:FastQC-Trimmomatic-Unicycler(SPAdes)-Prokka-blast against custom database。更麻烦的是需要经常写一些小脚本处理格式。总之很烦躁,还很难发好文章(血与泪的教训)。

    该软件配置完成后可以一步到位,有木有觉得很激动,很爽?什么总结信息、提取16S序列构建进化树、物种分类、基于ANI来进行物种更细的分类(species/subspecies?)、泛基因组分析之类的一次性搞定。不知道正在准备搭建流程的公司看到这个有没有很激动。

    文章里提供的1.4版本的软件列表

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    AMRFinder    3.6.7    Finds acquired antimicrobial resistance genes and some point mutations in protein or assembled nucleotide sequences

    Aragorn    1.2.38    Finds transfer RNA (tRNA) features 

    Ariba    2.14.4    Antimicrobial resistance identification by assembly

    ART    2016.06.05    A set of simulation tools to generate synthetic next-generation sequencing reads 

    assembly-scan    0.3.0    Generates basic stats for an assembly

    Barrnap    0.9    Bacterial ribosomal RNA predictor

    BBMap    38.76    A suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data

    BCFtools    1.9    Utilities for variant calling and manipulating VCFs and BCFs

    Bedtools    2.29.2    A powerful tool set for genome arithmetic

    BioPython    1.76    Tools for biological computation written in Python 

    BLAST    2.9.0    Basic local alignment search tool

    Bowtie2    2.4.1    A fast and sensitive gapped-read aligner

    BWA    0.7.17    Burrows-Wheeler Aligner for short-read alignment

    CD-HIT    4.8.1    Accelerated for clustering the next-generation sequencing data 

    CheckM    1.1.2    Assesses the quality of microbial genomes recovered from isolates, single cells, and metagenomes

    ClonalFrameML1.12    Efficient inference of recombination in whole bacterial genomes 

    DiagrammeR 1.0.0 Graph and network visualization using tabular data in R https://github.com/rich-iannone/DiagrammeR

    DIAMOND 0.9.35 Accelerated BLAST-compatible local sequence aligner https://github.com/bbuchfink/diamond

    eggNOG-Mapper    2.0.1    Fast genome-wide functional annotation through orthology assignment

    EMIRGE    0.61.1    Reconstructs full-length ribosomal genes from short-read sequencing data

    FastANI    1.3    Fast whole-genome similarity (ANI) estimation 

    FastTree2    2.1.10    Approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences

    fastq-dl    1.0.3    Downloads FASTQ files from SRA or ENA repositories

    FastQC    0.11.9    A quality control analysis tool for high throughput sequencing data

    fastq-scan    0.4.3    Outputs FASTQ summary statistics in JSON format

    FLASH    1.2.11    A fast and accurate tool to merge paired-end reads

    freebayes    1.3.2    Bayesian haplotype-based genetic polymorphism discovery and genotyping

    GNU Parallel    20200122    A shell tool for executing jobs in parallel

    GTDB-tk    1.0.2    A tool kit for assigning objective taxonomic classifications to bacterial and archaeal genomes

    HMMER    3.3    Biosequence analysis using profile hidden Markov models

    Infernal    1.1.2    Searches DNA sequence databases for RNA structure and sequence similarities

    IQ-TREE    1.6.12    Efficient phylogenomic software by maximum likelihood

    ISMapper    2.0    Insertion sequence mapping software

    Lighter    1.1.2    Fast and memory-efficient sequencing error corrector

    MAFFT    7.455    Multiple alignment program for amino acid or nucleotide sequences

    Mash    2.2.2    Fast genome and metagenome distance estimation using MinHash

    Mashtree    1.1.2    Creates a tree using Mash distances

    maskrc-svg    0.5    Masks recombination as detected by ClonalFrameML or Gubbins and draws an SVG

    McCortex    1.0    De novo genome assembly and multisample variant calling

    MEGAHIT    1.2.9    Ultra-fast and memory-efficient (meta-)genome assembler

    MinCED    0.4.2    Mining CRISPRs in environmental data sets

    Minimap2    2.17    A versatile pairwise aligner for genomic and spliced nucleotide sequences

    ncbi-genome-download    0.2.12    Scripts to download genomes from the NCBI FTP servers

    Nextflow    19.10.0    A DSL for data-driven computational pipelines

    phyloFlash    3.3b3    Rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of anIllumina (metagenomic data set)

    Pigz    2.3.4    A parallel implementation of gzip for modern multiprocessor, multicore machines

    Pilon    1.23    An automated genome assembly improvement and variant detection tool

    PIRATE    1.0.3    A toolbox for pan-genome analysis and threshold evaluation

    pplacer    1.1.alpha19    Phylogenetic placement and downstream analysis

    Prodigal    2.6.3    Fast, reliable protein-coding gene prediction for prokaryotic genomes

    Prokka    1.4.5    Rapid prokaryotic genome annotation

    QUAST    5.0.2    Quality assessment tool for genome assemblies

    Racon    1.4.13    Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads

    Roary    3.13.0    Rapid large-scale prokaryote pan genome analysis

    samclip    0.2    Filter SAM file for soft and hard clipped alignments

    SAMtools    1.9    Tools for manipulating next-generation sequencing data

    Seqtk    1.3    A fast and lightweight tool for processing sequences in the FASTA or FASTQ format

    Shovill    1.0.9se    Faster assembly of Illumina reads

    SKESA    2.3.0    Strategic k-mer extension for scrupulous assemblies

    Snippy    4.4.5    Rapid haploid variant calling and core genome alignment

    SnpEff    4.3.1    Genomic variant annotations and functional effect prediction toolbox

    snp-dists    0.6.3    Pairwise SNP distance matrix from a FASTA sequence alignment 

    SNP-sites    2.5.1    Rapidly extracts SNPs from a multi-FASTA alignment

    Sourmash    3.2.0    Compute and compare MinHash signatures for DNA data sets 

    SPAdes    3.13.0    An assembly toolkit containing various assembly pipelines

    Trimmomatic    0.39    A flexible read trimming tool for Illumina NGS data

    Unicycler    0.4.8    Hybrid assembly pipeline for bacterial genomes

    vcf-annotator    0.5    Add biological annotations to variants in a VCF file 

    Vcflib    1.0.0rc3    A simple C library for parsing and manipulating VCF files

    Velvet    1.2.10    Short read de novo assembler using de Bruijn graphs

    VSEARCH    2.14.1    Versatile open-source tool for metagenomics

    vt    2015.11.10    A tool set for short-variant discovery in genetic sequence data

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    5. 软件使用

        5.1 软件安装

        conda create -y -n bactopia -c conda-forge -c bioconda bactopia

        conda activate bactopia

        bactopia datasets datasets/ #这里会下载到指定的目录‘datasets/',包含了CARD,VFDB(核心),RefSeq Mash Sketch,GenBank Sourmash Signatures, PLSDB Mash Sketch & BLAST。

        5.2 软件运行

        双端数据

    bactopia --R1 ${SAMPLE}_R1.fastq.gz --R2 ${SAMPLE}_R2.fastq.gz --sample ${SAMPLE} \

            --datasets datasets/ --outdir ${OUTDIR}

        单端数据

    bactopia --SE ${SAMPLE}.fastq.gz --sample ${SAMPLE} --datasets datasets/ --outdir ${OUTDIR}

        多样本

    bactopia prepare directory-of-fastqs/ > fastqs.txt

    bactopia --fastqs fastqs.txt --datasets datasets --outdir ${OUTDIR}

        ENA数据(真香)

    bactopia --accessions ena-accessions.txt \

            --datasets datasets/ \

            --species "Staphylococcus aureus" \

            --coverage 100 \

            --genome_size median \

            --cpus 2 \

            --outdir ena-multiple-samples

    相关文章

      网友评论

        本文标题:细菌基因组分析软件--Bactopia

        本文链接:https://www.haomeiwen.com/subject/axrnrktx.html