circos可视化LTR的相关信息

作者: wo_monic | 来源:发表于2021-01-27 10:02 被阅读0次

    circos的安装使用

    参考资料

    参考教程 https://www.jianshu.com/p/741e77c94714
    徐州更教程 https://www.jianshu.com/u/9ea40b5f607a
    circos https://www.bilibili.com/read/cv7812579/
    全网最完整的circos中文教程
    circos heatmap
    circos color参考

    安装 circos

    conda create -c bioconda -n circos circos

    测试circos

    conda activate circos
    circos -V
    

    软件官网源码和教程http://circos.ca/software/download/
    颜色参考 https://colorbrewer2.org/#type=qualitative&scheme=Set3&n=12
    注意:circos作图时,染色体一定要注意Chr1或Chr01格式,格式前后一定要一致。基因名称一定要一致,是'Sp01G000010.1' 还是'Sp01G000010',后面的.1有没有一定要一致,否则就会出错

    制作滑动窗展示基因密度(需要安装bedtools)

    从gff3文件获取基因的位置

    grep '[[:blank:]]gene[[:blank:]]' Spohua.genome.gff3 | cut -f 1,4,5 | awk '{print $1"\t"$2"\t"$3}'|grep Chr > genes.bed

    从基因组位置文件获取滑动窗

    cut -d ' ' -f 3,6 Spo.genome.txt | tr ' ' '\t' > Spo.genome
    bedtools makewindows -g Spo.genome.txt -w 1000000 >Spo.windows #设置window为1M

    Spo.genome.txt是基因组染色体长度的文件
    head Spo.genome.txt

    Chr1   26663249
    Chr2   25493351
    Chr3   22916255
    Chr4   22507087
    Chr5   18661867
    Chr6   19687438
    Chr7   15147718
    Chr8   14276932
    Chr9   12412806
    Chr10   12144945
    

    可以使用下面的代码来获取,

    samtools faidx ${genome}
    cat ${genome}.fai |cut -f 1-2| fgrep ${chr}|sort -V >${prefix}.genome.txt
    

    获取基因覆盖率

    bedtools coverage -a Spo.windows -b genes.bed | cut -f 1-4 > genes_num.txt

    染色体配置文件

    cat Spo.genome|awk '{print "chr - "$1" "$1" 0 "$2" "$1}' >karyotype.Spo.txt

    获取GC含量

    bedtools nuc -fi Spo.genome.fa -bed Spo.windows |cut -f 1-3,5|sed '1d' >GC_content_num.txt

    生成circos图

    circos -conf circos.conf

    准备可视化LTR的相关数据

    ###Copia类型的转座子
    cat Spohua.fa.pass.list|grep Copia |cut -f1|tr : "\t"|sed 's/\.\./\t/g' >spo.Copia.bed
    ###Gypsy类型的转座子
    cat Spohua.fa.pass.list|grep Gypsy |cut -f1|tr : "\t"|sed 's/\.\./\t/g' >spo.Gypsy.bed
    ###从gff3文件获取基因的位置(基因密度)
    grep '[[:blank:]]gene[[:blank:]]' Spohua.genome.gff3 | cut -f 1,4,5 | awk '{print $1"\t"$2"\t"$3}'|grep Chr > genes.bed
    #制作滑动窗
    bedtools makewindows -g Spo.genome.txt -w 1000000 >Spo.windows #设置window为1M
    
    ##获取覆盖率
    bedtools coverage -a Spo.windows -b genes.bed |cut -f 1-4 >genes_num.txt
    bedtools coverage -a Spo.windows -b spo.Copia.bed |cut -f 1-4 >Copia_num.txt
    bedtools coverage -a Spo.windows -b spo.Gypsy.bed |cut -f 1-4 >Gypsy_num.txt
    
    ##准备LAI的文件(windowssize为300kb)
    cat Spohua.fa.out.LAI|cut -f 1,2,3,7 |sed '1,2d' >LAI_num.txt
    ###完整LTR的百分比(每个windowsize的完整LTR的百分比)
    cat Spohua.fa.out.LAI|awk '{print $1"\t"$2"\t"$3"\t"$4*100}'|sed '1,2d' >full_LTR_num.txt
    ###所有LTR的百分比(每个windowsize的LTR的百分比)
    cat Spohua.fa.out.LAI|awk '{print $1"\t"$2"\t"$3"\t"$5*100}'|sed '1,2d' >LTR_num.txt
    
    ##配置染色体文件
    cut -d ' ' -f 3,6 Spo.genome.txt | tr ' ' '\t' > Spo.genome
    cat Spo.genome|awk '{print "chr - "$1" "$1" 0 "$2" "$1}' >karyotype.Spo.txt
    
    ##准备共线性文件
    ####需要预先准备 cds,pep,bed 文件。参考[Jcvi的使用](https://www.jianshu.com/p/37fed126777f)。
    #####一个物种内部共线性需要准备一套cds,蛋白pep,bed文件就可以了。
    python -m jcvi.compara.catalog ortholog --dbtype prot --no_strip_names Spo Spo
    
    #####从共线性的基因对,提取整合成circos的bed格式。
    cat Spo.Spo.anchors |grep -v \#| while read line;
    do
            gene_array=($line)
            gene1_pos=`fgrep ${gene_array[0]} Spo.bed|cut -f 1-3`
            gene2_pos=`fgrep ${gene_array[1]} Spo.bed|cut -f 1-3`
            bed=${gene1_pos}" "${gene2_pos}
            echo $bed >>Spo.coline_num.txt
    done
    ###把染色体中的Chr0转换成Chr.(所有的用于circos画图的染色体都需要改成Chr这种,否则,默认的circos不会识别Chr01这种格式)
    cat Spo.coline_num.txt |sed 's/Chr0/Chr/g' >coline_num.txt
    

    circos的文件和环境配置

    circos是基于perl实现的,只能用来画图,并不能处理数据,所以数据需要先制作完成后,才能用circos画。
    配置文件列表

    • karyotype.Spo.txt (染色体长度配置文件) *
    • circos.conf (主要的conf文件,画图时,通过此文件来调用其他配置文件)*
    • ticks.conf (刻度控制文件)*
    • coline_num.txt (共线性文件)
    • GC_num.txt GC(含量文件)
    • full_LTR_num.txt ( 完整LTR的百分比)
    • Gypsy_num.txt (Gypsy的百分比)
    • LTR_num.txt (所有LTR的百分比)
    • Copia_num.txt(Copia的百分比)
    • genes_num.txt (基因密度)
    • LAI_num.txt(LAI的分布)
      所有以_num.txt结尾的都是画图的数据文件,可以根据实际需要灵活增删。
      标*的是必须文件。其他是选配文件。
      karyotype.Spo.txt文件内容如下
    chr - Chr1 Chr1 0 27519054 Chr1
    chr - Chr2 Chr2 0 37982726 Chr2
    chr - Chr3 Chr3 0 37451735 Chr3
    chr - Chr4 Chr4 0 28204096 Chr4
    chr - Chr5 Chr5 0 45635594 Chr5
    chr - Chr6 Chr6 0 32349783 Chr6
    chr - Chr7 Chr7 0 36685546 Chr7
    chr - Chr8 Chr8 0 31371431 Chr8
    chr - Chr9 Chr9 0 35059391 Chr9
    chr - Chr10 Chr10 0 43700000 Chr10
    chr - Chr11 Chr11 0 39651545 Chr11
    chr - Chr12 Chr12 0 31915921 Chr12
    chr - Chr13 Chr13 0 43197690 Chr13
    chr - Chr14 Chr14 0 32136955 Chr14
    chr - Chr15 Chr15 0 49422898 Chr15
    chr - Chr16 Chr16 0 36480970 Chr16
    chr - Chr17 Chr17 0 34479775 Chr17
    
    

    ticks.conf文件内容如下:

    #刻度ticks配置文件
    chromosomes_units = 1000000
    show_ticks          = yes
    show_tick_labels    = yes
    
    <ticks>
    
    radius           = 1r
    color            = black
    thickness        = 2p
    
    multiplier       = 1e-6 #输出的标签为实际长度与其相乘
    
    format           = %d # %d表示显示整数
    tick_separation = 3p #管理刻度和外圈之间的距离
    label_separation        = 10p
    <tick>
    spacing        = 1u
    size           = 5p
    </tick>
    
    <tick>
    spacing        = 5u
    size           = 10p
    show_label     = yes
    label_size     = 10p
    label_offset   = 10p
    format         = %d
    </tick>
    
    </ticks>
    
    

    head coline_num.txt

    Chr1 2863127 2878319 Chr3 19253647 19258135
    Chr1 3914355 3914748 Chr3 19905364 19905757
    Chr1 3976083 3982656 Chr3 21062604 21067133
    Chr1 4195280 4197605 Chr3 20801492 20805443
    
    

    cat circos.conf

    #首尾染色体之间有间隔,添加了共线性,同时调整了半径
    
    karyotype = karyotype.Spo.txt #定义基因组的染色体大小文件
    #chromosomes_color = Chr1=rdylbu-11-div-1,Chr2=rdylbu-11-div-3,Chr3=rdylbu-11-div-5,Chr4=rdylbu-11-div-7,Chr5=rdylbu-11-div-9 #定义染色体的颜色,染色体名称是karyotype.Spo.txt中第3列的名称Chr1-17
    <<include ticks.conf>> #引入自定义的刻度配置文件,目录是相对路径
    
    ##共线性图
    <links>
    
    <link>
    file          = coline_num.txt #共线性文件
    radius        = 0.40r #外圈半径
    color         = blue_a4 #默认颜色
    thickness = 1p #定义线条的粗细
    ribbon = yes
    ######rules指定线条的颜色,
    <rules>
    <rule>
    condition = var(chr1) eq "Chr1" #共线性文件中,左边是Chr1的,指定颜色。var(chr1)这个不用修改,指的是file里左侧第一个,后面的“Chr1”根据实际的Chr来修改。
    color = rdylgn-11-div-1
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr2"
    color = rdylgn-11-div-2
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr3"
    color = rdylgn-11-div-3
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr4"
    color = rdylgn-11-div-4
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr5"
    color = rdylgn-11-div-5
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr6"
    color = rdylgn-11-div-6
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr7"
    color = rdylgn-11-div-7
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr8"
    color = rdylgn-11-div-8
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr9"
    color = rdylgn-11-div-9
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr10"
    color = rdylgn-11-div-10
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr11"
    color = rdylgn-11-div-11
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr12"
    color = rdylbu-9-div-1
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr13"
    color = rdylbu-9-div-2
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr14"
    color = rdylbu-9-div-3
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr15"
    color = rdylbu-9-div-4
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr16"
    color = rdylbu-9-div-5
    </rule>
    
    <rule>
    condition = var(chr1) eq "Chr17"
    color = rdylbu-9-div-6
    </rule>
    
    </rules>
    
    </link>
    </links>
    
    
    
    <plots>
    
    #完整的LTR的百分比直方图
    <plot>
    type = histogram
    file = full_LTR_num.txt
    fill_color = 231,41,138 # 填充色
    r1   = 0.50r
    r0   = 0.41r
    </plot>
    
    
    #LTR的百分比直方图
    <plot>
    type = histogram
    file = LTR_num.txt
    fill_color = 254,196,79 # 填充色
    r1   = 0.60r
    r0   = 0.51r
    </plot>
    
    
    
    #Gypsy的热图
    <plot>
    type = heatmap
    file = Gypsy_num.txt
    color = oranges-9-seq #填充色必须是一个list,格式是 color = reds-9-seq 或者color = red,green,blue
    r1   = 0.70r
    r0   = 0.61r
    </plot>
    
    #Copia的热图
    <plot>
    type    = heatmap
    file    = Copia_num.txt
    color   = reds-9-seq #填充色
    r1      = 0.80r
    r0      = 0.71r
    </plot>
    
    ##LAI的散点图 参考https://blog.csdn.net/weixin_43569478/article/details/83824929
    <plot>
    type = scatter #散点图
    fill_color       = grey # 点的填充色
    stroke_color     = black #边框的颜色
    glyph            = circle
    glyph_size       = 3 # 元素大小
    file = LAI_num.txt
    r1   = 0.90r
    r0   = 0.81r
    
    #划分3个区间,每个区间不同的颜色。用于LAI的划分
    <backgrounds>
    <background>
    color = vvlgreen
    y0 = 20
    </background>
    <background>
    color = vlgrey
    y1 = 20
    y0 = 10
    </background>
    <background>
    color = vvlred
    y0 =10
    </background>
    
    </backgrounds>
    ##设置y轴刻度线
    <axes>
    <axis>
    color = lgreen
    thickness = 1p
    spacing = 0.05r
    y1 = 20
    y0 = 10
    </axis>
    </axes>
    
    </plot>
    
    ##基因密度直方图
    <plot>
    type = histogram
    file = genes_num.txt
    fill_color = blue # 填充色
    r1   = 0.99r
    r0   = 0.91r
    </plot>
    
    </plots>
    
    
    <ideogram>
    <spacing>
    default = 0.005r
    
    #设置染色体之间空出个距离,用来标图例a,b,c,d,e,f
    <pairwise Chr17;Chr1>  #染色体名称是karyotype.Spo.txt中第3列的名称,根据你实际的物种第一条和最后一条来修改此处
    spacing = 5r #设置第一个和最后一个染色体中间空出5*default(0.005r)的距离
    </pairwise>
    
    </spacing>
    
    radius           = 0.85r #指定画图的半径(此半径和plots里的半径不一致,这个是总的0.85r,决定了整个圈图的最大的半径,plots里的0.99r半径是以这个0.85r作为100%来分配99%)
    thickness        = 10p #坐标轴的染色体的厚度
    fill             = yes
    show_bands      = yes #设置染色体条带填充,如果不想用条带填充,只用颜色,就设置为no
    fill_bands      = yes
    stroke_color     = dgrey
    stroke_thickness = 2p
    show_label     = yes #展示label
    label_font     = default # 字体
    label_radius   = dims(ideogram,radius) + 0.075r #位置
    label_size     = 12 # 字体大小
    label_parallel = yes # 是否平行
    label_format   = eval(sprintf("%s",var(chr))) # 格式
    
    </ideogram>
    
    <image>
    # 覆盖在 etc/image.conf 中定义的 angle_offset
    angle_offset* = -82  #指定旋转角度,用于插入图例的位置
    dir* = . #输出文件夹
    radius* = 500p #图片半径
    svg* = yes #是否输出svg,默认是yes
    <<include etc/image.conf>>
    </image>
    
    <<include etc/colors_fonts_patterns.conf>>
    <<include etc/housekeeping.conf>>
    

    head GC_num.txt

    Chr1    0       1000000 38.8549
    Chr1    1000000 2000000 38.1306
    Chr1    2000000 3000000 37.9843
    Chr1    3000000 4000000 37.4317
    Chr1    4000000 5000000 37.967
    Chr1    5000000 6000000 37.9894
    
    
    circos.png
    colors的选择

    circos可能会出现的报错

    There was a problem with True Type font support. Circos could not render text
    from the font file
    /share/home/miniconda3/envs/circos/bin/../fonts/modern/cmunrm.otf
    Please check that gd (system graphics library) and GD (Perl's interface to gd)
    are compiled with True Type support.
    这种需要重新安装libGD和GD

    wget https://github.com/libgd/libgd/releases/download/gd-2.3.1/libgd-2.3.1.tar.gz
    tar zxvf libgd-2.3.1.tar.gz
    cd libgd-2.3.1
    ./configure
    curl -O http://www.ijg.org/files/jpegsrc.v8d.tar.gz
    tar -xzvf jpegsrc.v8d.tar.gz
    cd jpeg-8d
    ./configure --prefix=/path/to/jpeg-8d
    make
    make install
    
    #安装libpng
    curl -O ftp://ftp.simplesystems.org/pub/libpng/png/src/libpng16/libpng-1.6.2.tar.gz
    tar -xzvf libpng-1.6.2.tar.gz
    cd libpng-1.6.2.tar.gz
    ./configure --prefix=/path/to/jpeg-8d
    make -j8
    make install 
    #重新安装GD(使用cpanm安装)
    cpanm -f GD
    

    可能会提示缺少perl模块

    circos -modules检测是否缺少perl模块
    cpanm GD #安装缺少的GD模块

    相关文章

      网友评论

        本文标题:circos可视化LTR的相关信息

        本文链接:https://www.haomeiwen.com/subject/pbybzktx.html