1.安装Syri
GitHub页面:https://github.com/schneebergerlab/syri
环境要求:
🔸Python >=3.8
🔸python包: Cython-0.29.23, numpy-1.20.2, scipy-1.6.2, pandas-1.2.4, python-igraph-0.9.1, psutil-5.8.0, pysam-0.16.0.1, and matplotlib-3.3.4
🔸C/C++ compiler: g++
git clone https://github.com/schneebergerlab/syri.git
cd syri
python setup.py install
chmod +x syri/bin/syri syri/bin/chroder
2.运行Syri
nucmer --maxmatch -c 100 -b 500 -l 50 refgenome qrygenome
delta-filter -m -i 90 -l 100 out.delta > out_m_i90_l100.delta
show-coords -THrd out_m_i90_l100.delta > out_m_i90_l100.coords
/path/syri-1.5.1/syri/bin/syri -c out_m_i90_l100.coords -d out_m_i90_l100.delta -r refgenome -q qrygenome
绘制syri预测的基因组结构:https://github.com/schneebergerlab/plotsr
genomes.txt 是一个制表符分隔的文件,其中包含基因组的路径和名称。第三列也可以添加到基因组的自定义可视化属性中
$genomes.txt
#file name tags
/path/ref.fa col-0 lw:1.5
/path/query.fa ler lw:1.5
plotsr \
--sr syri.out \
--genomes genomes.txt \
-o plotsr_out.png
plotsr
3.生成结果
output3.1 syri.out(tsv格式)
syri.outColumn Number Value Type
1 chromosome ID in reference string
2 reference start position (1-based, includes start position) int
3 reference end position (1-based, includes end position) int
4 sequence in reference (Only for SNPs and indels) string
5 sequence in query (Only for SNPs and indels) string
6 chromosome ID in query string
7 query start position (1-based, includes start position) int
8 query end position (1-based, includes end position) int
9 unique ID (annotation type + number) string
10 parent ID (annotation type + number) string
11 Annotation type string
12 Copy status (for duplications) string
🔸Copy status:描述query/reference中是否有复制区域(copygain, i.e. query有额外的拷贝;copyloss, i.e. reference有额外的拷贝)
🔸Parent ID:对应于其中存在比对或局部变异的注释块(同线区域或结构重排)的唯一 ID。
因此,如果在ref Chr1:10 和query Chr2:542 的易位区域(唯一 ID TRANS1)中存在 A->T SNP(具有唯一 ID SNP1),则相应条目将是:
Chr1 10 10 A T Chr2 542 542 SNP1 TRANS1 SNP -
Annotation Meaning Annotation Meaning
SYN Syntenic region SYNAL Alignment in syntenic region
INV Inverted region INVAL Alignment in inverted region
TRANS Translocated region TRANSAL Alignment in translocated region
INVTR Inverted translocated region INVTRAL Alignment in inverted translocated region
DUP Duplicated region DUPAL Alignment in duplicated region
INVDP Inverted duplicated region INVDPAL Alignment in inverted duplicated region
NOTAL Un-aligned region SNP Single nucleotide polymorphism
CPG Copy gain in query CPL Copy loss in query
HDR Highly diverged regions TDM Tandem repeat
INS Insertion in query DEL Deletion in query
网友评论