使用 EXCAVATOR2 对WES数据找CNV

作者: 因地制宜的生信达人 | 来源:发表于2019-04-29 14:40 被阅读7次

使用 EXCAVATOR2 对WES数据找CNV
使用cnvkit来对大批量wes样本找cnv
两个CNV信息的合并后比较
cnvCapSeq | 长片段捕获测序的CNV检测
【SCI复现】绘制CNV棒棒糖图
2018-10-16
工具安装及试用总结：对WES数据做germline的CNV ca
CNVnator
TCGAbiolinks下载CNV数据存疑
R语言合并TCGA中CNV矩阵

使用 EXCAVATOR2 对WES数据找CNV

工具首发于2013，于2016进行了重大更新，文章列表：

cd ~/biosoft
# https://sourceforge.net/projects/excavator2tool/?source=navbar
mkdir EXCAVATOR2 &&  cd EXCAVATOR2 
wget https://sourceforge.net/projects/excavator2tool/files/EXCAVATOR2_Package_v1.1.2.tgz
tar zxvf EXCAVATOR2_Package_v1.1.2.tgz 
# 软件400多M,里面有个pdf说明书。

说明书实在是太复杂了。软件只是是一个压缩包，解压即可使用，里面自带了perl,r,shell脚本，比较方便使用，而比较麻烦的是需要系统有Hmisc这个R包。

> library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Use suppressPackageStartupMessages() to eliminate package startup
messages.

Attaching package: ‘Hmisc’

The following objects are masked from ‘package:base’:

    format.pval, round.POSIXt, trunc.POSIXt, units

>

有趣的是该软件需要R去编译两个fortran文件。

软件解析

软件也是3个步骤：

TargetPerla.pl
EXCAVATORDataPrepare.pl
EXCAVATORDataAnalysis.pl

第一个步骤是 TargetPerla.pl，处理一下参考基因组以及外显子坐标问题，需要五个参数：

the path to a source file (e.g. SourceTarget.txt), 是软件的配置文件：，
the path to the target input file，就是BED格式的坐标文件，需要前3列坐标
a “target name”,
the window size (i.e. 10000, 20000 or 500000)
the assembly (allowed options are: hg19 and hg38).

注意BED文件需要 sort -k1,1 -k2,2n *.bed | bedtools merge
作者给的例子是：perl TargetPerla.pl SourceTarget.txt myTarget.bed MyTarget_w50K 50000 hg19
软件本身也默认给了一些数据：

data/
├── [  74]  centromere
│   ├── [ 592]  CentromerePosition_hg19.txt
│   └── [ 592]  CentromerePosition_hg38.txt
├── [237M]  GCA_000001405.15_GRCh38.bw
├── [  28]  support
│   ├── [  65]  hg19
│   │   ├── [ 446]  ChromosomeCoordinate_HG19.txt
│   │   └── [ 23K]  GapHg19.UCSC.txt
│   └── [  65]  hg38
│       ├── [ 508]  ChromosomeCoordinate_HG38.txt
│       └── [ 43K]  GapHg38.UCSC.txt
├── [  28]  targets
│   ├── [   6]  hg19
│   └── [   6]  hg38
└── [216M]  ucsc.hg19.bw

其中附带的GCA_000001405.15_GRCh38.bw是来自于：ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz 最新版是：GenBank assembly accession: GCA_000001405.27 (latest).

一般来说，我们会有自己的参考基因组，作者推荐用GEM suite (http://gemlibrary.sourceforge.net/),来把自己的参考基因组转换成bw文件。

然后再走第一步，会产生一个文件夹给后续分析使用。

第二个步骤是：EXCAVATORDataPrepare.pl 对自己的测序bam文件进行一定的计算处理

作者给的示例代码是：

perl EXCAVATORDataPrepare.pl ExperimentalFilePrepare.w50000.txt \
--processors 6 --target MyTarget_w50000 --assembly hg19

其中 --target 参数是第一步的结果文件夹。
而ExperimentalFilePrepare.w50000.txt这个就是配置文件，包含3列，分别是bam文件的全路径，以及每个样本的输出结果文件夹，以及样本名。

第三个步骤是： EXCAVATORDataAnalysis.pl 判断CNV状态

主要分成5种CNV状态： 2-copy deletion, 1-copy deletion, normal, 1-copy duplication and N-copy amplification).
作者给的示例代码是：

perl EXCAVATORDataAnalysis.pl ExperimentalFileAnalysis.w50K.txt \
--processors 6 --target MyTarget_w50K --assembly hg19 \
--output /.../OutEXCAVATOR2/Results_MyProject_w50K

还是需要自己手动制作配置文件，一般是配对肿瘤外显子数据找cnv，所以需要在配置文件的第一列指定每个样本属于T,还是C，然后是第几个样本。
参加教程的 Figure 3: A typical well-formatted input file for EXCAVATORDataAnalysis.pl module and “paired” mode.

可能会需要修改软件运行参数，修改的前提是真正理解它们了。

## Omega parameter for the HSLM algorithm ##
0.1
## Theta parameter (baseline probability m_i changes its value) for the HSLM algorithm ##
1e-5
## D_norm parameter for the HSLM algorithm ##
10e5
## Cellularity parameter for the FastCall Calling algorithm ##
1
## Threshold d for the truncated gaussian distribution of the FastCall Calling algorithm ##
0.5
## Threshold u for the truncated gaussian distribution of the FastCall Calling algorithm ##
0.35
## Segment with a number of exons smaller than a threshold are filtered out ##

实战

未完待续

网友评论

生信工具

本文标题：使用 EXCAVATOR2 对WES数据找CNV

本文链接：https://www.haomeiwen.com/subject/vqpynqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

使用 EXCAVATOR2 对WES数据找CNV

使用 EXCAVATOR2 对WES数据找CNV

软件解析

第一个步骤是 TargetPerla.pl，处理一下参考基因组以及外显子坐标问题，需要五个参数：

第二个步骤是：EXCAVATORDataPrepare.pl 对自己的测序bam文件进行一定的计算处理

第三个步骤是： EXCAVATORDataAnalysis.pl 判断CNV状态

实战

相关文章