【R】seqlogo图

作者: 研究僧小蓝哥 | 来源:发表于2020-08-09 10:22 被阅读0次

    seqlogo图常用于展示特定为区域的序列信息,就像这样^{[1]}​:

    之前很好奇这种图是怎么画出来的,后面知道了一个R包:ggseqlogo^{[2]}​。提供了一系列的可视化方法:

    作者也提供了完整的教程:https://omarwagih.github.io/ggseqlogo/

    这种图,重要的是理解数据结构,然后就可以用在自己的数据上了。本文的示例数据在公众号PLANTOMIX后台回复seqlogo即可获取。

    DNA序列

    有两种方法,一种是按照Bits进行展示,另外一种是以prob(比例)进行展示。直接将数据放在数据框里面即可:

    require(ggplot2)
     require(ggseqlogo)
     library(stringr)
     library(ggsci)
     library(tidyverse)
     ​
     ​
     # DNA序列
     seq_dna = read.table('data/test.DNA.seq.txt', header = T)
     ​
     p1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'prob') +
      theme_bw() +
      scale_y_continuous(labels = scales::percent)
     p1
     ggsave(p1, filename = 'figures/1.png', width = 5, height = 3)
     ​
     p1.1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'bits') +
      theme_bw() +
      scale_y_continuous(labels = scales::percent)
     p1.1
     ggsave(p1.1, filename = 'figures/1.1.png', width = 5, height = 3)
    

    氨基酸序列

     # 氨基酸序列
     seq_aa = read.table('data/test.AA.seq.txt', header = T)
     ​
     p2 = ggseqlogo(as.character(seq_aa$.), method = 'prob') +
      theme_bw() +
      scale_y_continuous(labels = scales::percent)
     p2
     ggsave(p2, filename = 'figures/2.png', width = 5, height = 3)
    

    自定义数据

    ggseqlogo支持自定义数据,如数字。

    # 自定义序列
     seq_diy = matrix(ncol = 1, nrow = 10) %>%
      as.data.frame()
     ​
     for (i in 1:nrow(seq_diy)) {
      seq.temp = as.character(sample(1:4,10, replace = T))
      seq.temp.2 = seq.temp[1]
      for (j in 2:10) {
      seq.temp.2 = paste(seq.temp.2, seq.temp[j], sep = '')
      }
      seq_diy[i,] = seq.temp.2
     }
     colnames(seq_diy) = 'test.seq'
     ​
     p4 = ggseqlogo(as.character(seq_diy$test.seq), 
      method = 'prob',
      namespace=1:4) +
      theme_bw() +
      scale_y_continuous(labels = scales::percent)
     p4
     ggsave(p4, filename = 'figures/4.png', width = 5, height = 3)
    

    矩阵类型数据

    另外一种使用得更多的数据应该是类似这样的:

    03.png
     # matrix数据
     seq_matrix = read.table('data/test.matrix.txt', header = T) %>%
      as.matrix()
     ​
     p3 = ggseqlogo(seq_matrix, method = 'bits') +
      theme_bw()
     p3
     ggsave(p3, filename = 'figures/3.png', width = 5, height = 3)
    

    更多可视化方法参照作者教程网站:https://omarwagih.github.io/ggseqlogo/

    参考文献

    [1] Li, Ying, et al. "Magnaporthe oryzae Auxiliary Activity Protein MoAa91 Functions as Chitin-Binding Protein To Induce Appressorium Formation on Artificial Inductive Surfaces and Suppress Plant Immunity." Mbio 11.2 (2020).
    [2] Wagih, Omar. "ggseqlogo: a versatile R package for drawing sequence logos." Bioinformatics 33.22 (2017): 3645-3647.

    相关文章

      网友评论

        本文标题:【R】seqlogo图

        本文链接:https://www.haomeiwen.com/subject/sgkldktx.html