不同版本基因组坐标的转换,常用的方法有:
1. NCBI的 Remap
参见上一篇文章 : https://www.jianshu.com/p/41e5280f59c3
2. UCSC的 LiftOver
https://genome.ucsc.edu/cgi-bin/hgLiftOver

3. CrossMap: http://crossmap.sourceforge.net/#installation
重点介绍和推荐该软件
该软件用法简单,只需要传入2个文件即可。
3.1 下载和安装
(1)Use pip to install CrossMap
pip3 install git+https://github.com/liguowang/CrossMap.git
or
pip3 install CrossMap #Install CrossMap supporting Python3
or
conda install CrossMap
(2) Install CrossMap from source code
$ tar zxf CrossMap-VERSION.tar.gz
$ cd CrossMap-VERSION
# install CrossMap to default location. In Linux/Unix, this location is like:
# /home/user/lib/python2.7/site-packages/
$ python setup.py install
# or you can install CrossMap to a specified location:
$ python setup.py install --root=/home/user/CrossMap
# setup PYTHONPATH. Skip this step if CrossMap was installed to default location.
$ export PYTHONPATH=/home/user/CrossMap/usr/local/lib/python2.7/site-packages:$PYTHONPATH.
# Skip this step if CrossMap was installed to default location.
$ export PATH=/home/user/CrossMap/usr/local/bin:$PATH
3.2 下载chain 文件
该文件是在转换坐标时的输入文件,可以直接从网站下载,找到对应的版本信息就可以了,如下:
UCSC built chain files (Human, Homo sapiens)
-
hg38ToHg19.over.chain.gz (Chain file for hg38 to hg19 conversion)
-
hg19ToHg38.over.chain.gz (Chain file for hg19 to hg38 conversion)
-
hg18ToHg38.over.chain.gz (Chain file for hg18 to hg38 conversion)
-
hg19ToHg18.over.chain.gz (Chain file for hg19 to hg18 conversion)
-
hg19ToHg17.over.chain.gz (Chain file for hg19 to hg17 conversion)
-
hg18ToHg19.over.chain.gz (Chain file for hg18 to hg19 conversion)
-
hg18ToHg17.over.chain.gz (Chain file for hg18 to hg17 conversion)
-
hg17ToHg19.over.chain.gz (Chain file for hg17 to hg19 conversion)
-
hg17ToHg18.over.chain.gz (Chain file for hg17 to hg18 conversion)
-
GRCh37ToHg19.over.chain.gz (Chain file for GRCh37 to hg19 conversion)
-
hg19ToGRCh37.over.chain.gz (Chain file for hg19 to GRCh37 conversion)
UCSC built chain files (Mouse, Mus musculus)
-
mm10ToMm9.over.chain.gz (Chain file for mm10 to mm9 conversion)
-
mm9ToMm10.over.chain.gz (Chain file for mm9 to mm10 conversion)
-
mm9ToMm8.over.chain.gz (Chain file for mm9 to mm8 conversion)
-
UCSC Chain file of other species can be downloaded from: http://hgdownload.soe.ucsc.edu/downloads.html
这里主要提供了人的转换文件,比如要把hg38换成hg19的,就直接下载 (Chain file for hg38 to hg19 conversion) 这个版本就可以了。
3.3 准备输入的bed文件
其实输入的原始坐标文件有很多种类型都能接受如bed、bam、wig、GFF/GTF、VCF、maf等,常见的是bed文件,该bed文件至少包含chr,start,end 这3列,用tab键分割,也可以包含更多列,如strand,ref.Function等信息,但最多只能有12列。
3.4 例子
python3 CrossMap.py bed hg38ToHg19.over.chain.gz in.origion.hg38.bed out.convert.hg19.bed
(1)找到刚才安装的CrossMap.py 脚本,一般在python目录的bin中;
(2)bed 是指定输入文件是bed类型,例如输入一个位点坐标:

(3)hg38ToHg19.over.chain.gz 是刚才下载的chain文件;
(4)in.origion.hg38.bed 是输入的原始坐标的bed文件,这里用的是3列;
(5)out.convert.hg19.bed 是输出文件名称,会与输入的bed的列数一样。

需要说明的是,如果原始坐标转换成新坐标后,坐标区间不连续,则会被分割成2个或更多的区间。
网友评论