dRep物种集去冗余

作者: 胡童远 | 来源:发表于2022-04-24 11:21 被阅读0次

drep可对基因组集去冗余,留下非冗余基因组集。从运行过程来看,先进行基因预测,然后用chekm做基因组质控,也因此需要安装配置checkm,去冗余基于ANI有关的参数。

文献信息

标题:dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication
期刊:ISME J
时间:2017

文档
drep:https://drep.readthedocs.io/en/latest/overview.html
checkm:https://github.com/Ecogenomics/CheckM/wiki
checkm_db: https://data.ace.uq.edu.au/public/CheckM_databases/

安装

# drep
conda install -c bioconda drep
# checkm
conda install hmmer pplacer
pip3 install checkm-genome
# checkm 数据库
wget -c https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
tar -zxvf https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
export CHECKM_DATA_PATH=/path/to/my_checkm_data

帮助文档

dRep --help
# Commands:
#    compare            -> Compare and cluster a set of genomes
#    dereplicate        -> De-replicate a set of genomes
#    check_dependencies -> Check which dependencies are properly installed
dRep dereplicate --help

-g 基因组集
-p 线程数
-comp 最小基因组完整度,默认75
-con 最大基因组污染度,默认25
-sa ANI聚类阈值,默认0.99

使用

dRep dereplicate output_dir/ \
-g /path/to/genomes/*.fasta \
-p 10 \
-comp 70 
-con 10 \
-sa 0.95

过程

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
2,295 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
Running prodigal
Running checkM
Running checkM in 2 chunks
Running checkM chunk 0
Running checkM chunk 1
21.83% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
117 primary clusters made
Running secondary clustering
Running 3611 ANImf comparisons- should take ~ 45.1 min
Step 4. Return output
***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

Loading work directory
***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
etc
***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

making plots 1, 2, 3, 4, 5, 6
Plotting primary dendrogram
Plotting secondary dendrograms
Plotting MDS plot
Plotting scatterplots
Plotting bin scorring plot
Plotting winning genomes plot...

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. 02_bin/bins_drep/dereplicated_genomes/
Dereplicated genomes information..... 02_bin/bins_drep/data_tables/Widb.csv
Figures.............................. 02_bin/bins_drep/figures/
Warnings............................. 02_bin/bins_drep/log/warnings.txt

相关文章

  • dRep物种集去冗余

    drep可对基因组集去冗余,留下非冗余基因组集。从运行过程来看,先进行基因预测,然后用chekm做基因组质控,也因...

  • R语言 RDA分析(去冗余物种)

    也做了挺多次RDA分析,自己现在小结一下RDA分析流程: 就我个人而言,虚线前面都是不太经历的步骤,我一般不会主动...

  • 2019-05-04

    GATE第二项目DREP腰斩,其实没那么不堪 GATE第二项目DREP腰斩,现在价格一分捌。其实DREP没那么不堪...

  • cd-hit基因集去冗余

    主页:http://weizhong-lab.ucsd.edu/cd-hit/[http://weizhong-l...

  • cd-hit 基因集去冗余

    cd-hit 是用于蛋白质序列或核酸序列聚类的工具,根据序列的相似度对序列进行聚类以去除冗余的序列,一般用于构建非...

  • 去PCR冗余

    ref:你真的懂Illumina数据质量控制吗? | hope 1. FastQC察看 2. 进行reads的修剪...

  • 安装dRep

    https://github.com/MrOlm/drep https://www.nature.com/arti...

  • 富集分析去冗余

    https://github.com/YuLab-SMU/clusterProfiler/issues/28

  • mongodb——分布式

    复制集 mongodb在集群环境中,通过复制的形式对数据进行冗余。mongodb复制集有Primary、Secon...

  • MongoDB3.0.6搭建主从复制集

    什么是复制集? 复制集(Replica Sets)是额外的数据副本,是跨多个服务器同步数据的过程,复制集提供了冗余...

网友评论

    本文标题:dRep物种集去冗余

    本文链接:https://www.haomeiwen.com/subject/udiiertx.html