说明,该系列原文写于2016年3月
使用circos首先需要做好原始文件的准备工作,在这一部分我们简单介绍常用原始文件的格式。
karyotype file 原始文件
简单理解,就是最重要的圆圈。多数情况下是染色体。
The karyotype file defines the axes.
In biological context, these are typically chromosomes, sequence contigs or clones.
Each axis (e.g. chromosome) is defined by unique identifier (referenced in data files), label (text tag for the ideogram seen in the image), size and color.
以上为必备内容,下面为可选内容。
In addition to chromosomes, the karyotype file is used to define position, identity and color of cytogenetic bands. For some genomes these band data are available.
#通用格式
chr - ID LABEL START END COLOR
#karyotype 最简格式
#chr:表示线代表染色体
#-:定义parent structure,只有在有band时进行定义
#ID:the identifier used in data files
#LABEL:the text that will appear next to the ideogram on the image,with a species identifier
#The start and end values define the size of the chromosome.
#注意:该处应该保存整个染色体的大小, 而不可以是只有想要展示的区域。
#例1:karyotype arabidopsis
chr - chr1 chr1 0 30427617 black
chr - chr2 chr2 0 19698289 black
chr - chr3 chr3 0 23459830 black
chr - chr4 chr4 0 18585056 black
chr - chr5 chr5 0 26975502 black
备注1:该文件要保存为.txt格式
备注2:染色体的颜色可以自行定义
2D Data tracks
#data file format for 2D plots 原始文件通用格式
chr start end value options
#例2 Scatter Plots 散点图 (用来表示SNP density等)
# value 0.005 at span 1000-2000
hs1 1000 2000 0.005
# value 0.010 at span 2001-2001, e.g. a single base position
hs1 2001 2001 0.010
#例3 Line Plots 线形图
hs10 2750000 2999999 1108
hs10 3000000 3249999 1458
hs10 3250000 3499999 1039
hs10 3500000 3749999 871
hs10 3750000 3999999 1155
#例4 Histogram Plots 柱状图
hs1 0 4225755 0.828336
hs1 7062960 7489033 0.567553
hs1 7489034 17008645 0.141430
hs1 17008646 20657534 0.586990
hs1 20657535 25082103 0.670437
#例5 Tiles 片形图
#assembly clones
hs1 1 616 color=black
hs1 617 167280 color=black
hs1 167281 217280 color=red
hs1 217281 257582 color=black
#gene regions
hs1 100088227 100162167
hs1 100088632 100162167
hs1 100089118 100162167
#例6:Heat maps 热图
hs7 36975000 36999999 33
hs7 37000000 37024999 50
hs7 37025000 37049999 60 color=blue
hs7 37050000 37074999 44
备注1:原始文档中的value值必须对应一个明确的跨度,而不能是一个单碱基位置;
备注2:在散点图和线形图中,数据点会出现在跨度的中心位置,但是在柱状图中整个跨度的范围内都会标注出内容。
备注3:片形图通常可以用来展示基因组区域,比如基因外显子、基因复制数量多样性、片段重复和保守区域等等。
Text Labels 文字原始文件
#通用格式
chr start end text options
#例7
hs1 225817866 225910748 ZNF678
hs1 26560711 26571853 ZNF683
hs1 40769819 40786426 ZNF684 color=red
hs1 149521414 149531004 ZNF687
Link data 连结关系图原始文件
连结数据原始文件每一行展示一组两个位置的一组关系,也可以增加一些特殊参数。
#例8
hs1 100 200 hs2 250 300 color=blue
hs1 400 550 hs3 500 750 color=red,thickness=5p
hs1 600 800 hs4 150 350 color=black
Highlights 强调展示原始文件
#例9 最简格式 (只有位置信息即可)
hs1 1298972 1300443
hs1 1311738 1324571
hs1 1397026 1421444
hs1 1437417 1459927
#例10 带有附加信息(如填充颜色、优先级、半径范围等信息)
hs1 1725862 8379128 fill_color=chr7,z=68,r0=0.4r-65.3669p,r1=0.4r+65.3669p
hs1 4080887 11075336 fill_color=chr8,z=66,r0=0.4r-68.719p,r1=0.4r+68.719p
hs1 5183662 14345280 fill_color=chr10,z=55,r0=0.4r-90.011p,r1=0.4r+90.011p
hs1 10044837 11066617 fill_color=chr1,z=95,r0=0.4r-10.0388p,r1=0.4r+10.0388p
备注1:强调参数可以写在配置文件中,也可以直接植入原始数据文件中。
备注2:例10中的z参数表示优先级,数字大则优先显示。
为后期回顾查阅方便,附原始说明
karyotype — biology applications
The karyotype file defines the chromosomes. By default, all chromosomes will be drawn.
Each chromosome has a name, label, start and end position and a color. For example, the human karyotype file looks like this
chr - hs1 1 0 249250621 chr1
chr - hs2 2 0 243199373 chr2
chr - hs3 3 0 198022430 chr3
Circos uses species prefix for chromosome names (e.g. human: hs1, hs2, … ; mouse: mm1, mm2, … ) instead of the generic “chr” prefix. Chromosome colors, however, use the “chr” prefix, because they’re not meant to be species specific.
The karyotype file can optionally define cytogenetic bands for each chromosome.
band hs1 p36.33 p36.33 0 2300000 gneg
band hs1 p36.32 p36.32 2300000 5400000 gpos25
band hs1 p36.31 p36.31 5400000 7200000 gneg
karyotype — general applicationsIf your data is not based on chromosomes, then use the karyotype file to define whatever axes you need to display them.
For example, this will define 3 segments of size 1000, 2000 and 3000 named axis1, axis2 and axis3.
chr - axis1 1 0 1000 black
chr - axis2 1 0 1500 blue
chr - axis3 1 0 2000 green
line, scatter, histogram, heat map
Line, scatter, histogram and heat map tracks are 2D data tracks that associate a value with a genomic position.
#chr start end value [options]
hs5 50 75 0.75
tile
A tile track defines an interval on the same chromosome. It is used to display coverage elements like reads or clones.
#chr start end [options]
hs5 50 75
text
A text track associates any string with a genomic position, typically used for text labels.
#chr start end label [options]
hs5 50 75 ABC
connector
A connector track two positions on the same chromosome, which are connected by a beveled connector.
#chr start end [options]
hs5 50 1500
A connector must start and end on the same chromosome.
links
Links associate two intervals between the same or different chromosomes. They can be drawn as lines or ribbons.
# chr1 start1 end1 chr2 start2 end2 [options]
hs1 200 300 hs10 1100 1300
hs7 50 150 hs 5000 6000 color=blue
binlinks, bundlelinks and filterlinks tools (all found in the tools distribution) are used to manipulate and analyze link files.
options
Any formatting option specific to a data point (shape, size, color, etc) defined in the <plot>
, <link>
, or<highlight>
block individually set for a data point in the input file.
In the file formats shown above, the[options]
string is a comma-delimiter set of variable=value
pairs.
chr start end var1=value1,var2=value2,...
For options that are passed as a list (e.g. color RGB values), you’ll need to delimit the option value with (
and []
chr start end color=(R,G,B)
options with and without data values
Input files that associate a value with a genomic position have the options field in the 5th column
chr start end value options
For files that do not have a value (e.g. tile, highlight), the field is in the 4th column
chr start end options
If you attempt to use a file with values as input to tracks that do not expect values, Circos will attempt to parse the value field (4th column) as an options string and will report an error.
Error parsing data point options. Saw parameter assignment [0.75] but expected it to be in the format x=y.
data:image/s3,"s3://crabby-images/dace2/dace2de0316550e3ef579f9738857f07a03a6218" alt=""
网友评论