Call种内微生物SNP用Parsnp

作者: 胡童远 | 来源:发表于2021-11-04 14:17 被阅读0次

parsnp可进行核心基因组对齐、call核心基因组SNP、构建SNP进化树。

Github: https://github.com/marbl/parsnp
Manual: https://harvest.readthedocs.io/en/latest/content/parsnp/tutorial.html

文章：The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes
中文：用于快速核心基因组比对和数千种种内微生物基因组可视化的 Harvest 套件
杂志：Genome Biology
时间：2014
被引：913

文章知识点：
1 核心基因组是全基因组的自己，这些同源序列可用于多基因组序列对齐
2 核心基因垂直传递，信噪比更强
3 核心基因SNP是研究进化关系最可靠的突变
4 核心基因组SNP是目前研究近源微生物从大标准方法
5 core-genome SNP typing有三种方法：
read mapping: 成本最低
k-mer analyses
whole-genome alignment：原方法，检测SNP INDEL金标准，依赖组装

Parsnp原理：
1 基于suffix graph data structure快速检测maximal unique matches (MUMs)
MUM是很多pairwise/multiple基因组对齐的基础
2 Gingr可视化比对结构
3 输入要求：同亚种或ANI>=97%，基于MUMi distance指定纳入标准
4 Directed Acyclic Graph (DAG) 数据结构，也称Compressed Suffix Graph (CSG)
5 multi-MUMs检测locally collinear blocks (LCBs)，构建核心基因组对齐的基础
6 LCB作为anchor，MUSCLE定位gap
7 以上结果包含核心基因组所有的：SNP, INDEL, 结构变异
8 确定核心基因组SNP的方法：(1) repetitive sequence; (2) small LCB size; (3) poor alignment quality; (4) poor base quality; and (5) possible recombination。FreeBayes推测碱基质量，PhiPack检测碱基重排。
9 FastTree2构建进化树

方法比较：

引用：
1 Genome sequence of a clinical Salmonella Enteritidis sequence type 11 strain from South Africa. Journal of Global Antimicrobial Resistance. 2019
2 OutbreakFinder: a visualization tool for rapid detection of bacterial strain clusters based on optimized multidimensional scaling. PeerJ. 2019

安装

conda create -n parsnp
conda install parsnp
parsnp --version
#|--Parsnp 1.5.6--|
#For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
#parsnp 1.5.6

使用

parsnp \
-c -p 4 -n mafft \
-r ./T2103096416.fna \
-d ./input/*.fna \
-o ./parsnp/

参数
-d Input genomes
-g <reference_genbank>
-r <reference_fasta>
-o OUTPUT_DIR
-c, --curated (c)urated genome directory
--use-fasttree Use fasttree instead of RaxML
--vcf Generate VCF file.
-p THREADS, --threads THREADS
-n {mafft,muscle,fsa,prank}, --alignment-program
-u, --unaligned Ouput unaligned regions

过程

19:51:26 - INFO - <<Parsnp started>>
19:51:26 - INFO - No genbank file provided for reference annotations, skipping..
19:51:26 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
19:51:45 - INFO - Reconstructing core genome phylogeny...
19:51:45 - INFO - Aligned 4 genomes in 18.35 seconds
19:51:45 - INFO - Parsnp finished! All output available in ./parsnp/

结果

tree 树文件
xmfa 基因组对齐结果
Ini 运行配置

xmfa文件格式同mauve对齐结果文件，但丢弃了unalign的序列

>1:0-109 + cluster58 :p1
>2:1399845-1399954 + cluster58 s11:p323
>3:2214877-2214986 - cluster58 s41:p1147
>4:2220684-2220793 - cluster58 s51:p849
=
>1:138-164 + cluster59 s1:p138
>2:1399997-1400023 + cluster59 s11:p475
>3:2214808-2214834 - cluster59 s41:p1013
>4:2220615-2220641 - cluster59 s51:p715
=

Gingr可视化

Manual: https://harvest.readthedocs.io/en/latest/content/gingr.html

只有MAC Linux可视化软件才能用，无命令行模式，使用测试数据直接打开如下：

Harvest处理parsnap结果

Manual：https://harvest.readthedocs.io/en/latest/content/harvest/quickstart.html#download-install-run

下载解压可直接使用

wget -c https://github.com/marbl/harvest-tools/releases/download/v1.2/harvesttools-Linux64-v1.2.tar.gz/
tar -zxvf harvesttools-Linux64-v1.2.tar.gz
/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/harvesttools-Linux64-v1.2/harvesttools -h

参数：

harvesttools options:
   -i <Gingr input>
   -b <bed filter intervals>,<filter name>,"<description>"
   -B <output backbone intervals>
   -f <reference fasta>
   -F <reference fasta out>
   -g <reference genbank>
   -a <MAF alignment input>
   -m <multi-fasta alignment input>
   -M <multi-fasta alignment output (concatenated LCBs)>
   -n <Newick tree input>
   -N <Newick tree output>
   --midpoint-reroot (reroot the tree at its midpoint after loading)
   -o <Gingr output>
   -S <output for multi-fasta SNPs>
   -u 0/1 (update the branch values to reflect genome length)
   -v <VCF input>
   -V <VCF output>
   -x <xmfa alignment file>
   -X <output xmfa alignment file>
   -h (show this help)
   -q (quiet mode)

利用ggr文件获取snp fasta

/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/harvesttools-Linux64-v1.2/harvesttools \
-i parsnp.ggr \
-S parsnp.snps
# 过程
Loading parsnp.ggr...
Writing parsnp.snps...
# remove enter
python3 /hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/analysis/oral/hgt/db_blast/trim_enter.py \
parsnp.snps parsnp.snps2

结果：

利用ggr文件获取snp vcf

/hwfssz1/ST_HEALTH/P18Z10200N0423/hutongyuan/softwares/harvesttools-Linux64-v1.2/harvesttools \
-i parsnp.ggr \
-V parsnp.vcf

结果：

##FILTER=<ID=IND,Description="Column contains indel">
##FILTER=<ID=N,Description="Column contains N">
##FILTER=<ID=LCB,Description="LCB smaller than 200bp">
##FILTER=<ID=CID,Description="SNP in aligned 100bp window with < 50% column % ID">
##FILTER=<ID=ALN,Description="SNP in aligned 100b window with > 20 indels">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MF3KT15001B.fna.ref        BF3KT15003B.fna MF3KT15004.fna  PF3KT15003.fna
MF3KT15001B.Scaf2       48063   ACAGTGGGAT.AACTTACCCG   A       G       40      PASS       NA      GT      0       1       0       0
MF3KT15001B.Scaf2       48079   CCCGACTGGC.AACAAATTGG   A       G       40      PASS       NA      GT      0       1       0       0
MF3KT15001B.Scaf6       74973   AGTTGCTTGG.TTTAGAGTTT   T       C       40      PASS       NA      GT      0       1       1       1
MF3KT15001B.Scaf6       102844  ACATCATGAT.CTTCAAGTTC   C       G       40      PASS       NA      GT      0       1       0       0

网友评论

本文标题：Call种内微生物SNP用Parsnp

本文链接：https://www.haomeiwen.com/subject/jlmaoltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Call种内微生物SNP用Parsnp

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读