seqlogo
图常用于展示特定为区域的序列信息,就像这样:
data:image/s3,"s3://crabby-images/a0aaa/a0aaaacc446f480bbb5c9dd95e27b0893c9766b7" alt=""
之前很好奇这种图是怎么画出来的,后面知道了一个R包:ggseqlogo
。提供了一系列的可视化方法:
data:image/s3,"s3://crabby-images/4f9ec/4f9ec4384b0f50e9d60f1db2f904ee0eac7f0784" alt=""
作者也提供了完整的教程:https://omarwagih.github.io/ggseqlogo/。
这种图,重要的是理解数据结构,然后就可以用在自己的数据上了。本文的示例数据在公众号PLANTOMIX
后台回复seqlogo
即可获取。
DNA序列
有两种方法,一种是按照Bits
进行展示,另外一种是以prob
(比例)进行展示。直接将数据放在数据框里面即可:
data:image/s3,"s3://crabby-images/ac915/ac915fa2ee735a8e13876c34e1decb6a386c236f" alt=""
require(ggplot2)
require(ggseqlogo)
library(stringr)
library(ggsci)
library(tidyverse)
# DNA序列
seq_dna = read.table('data/test.DNA.seq.txt', header = T)
p1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'prob') +
theme_bw() +
scale_y_continuous(labels = scales::percent)
p1
ggsave(p1, filename = 'figures/1.png', width = 5, height = 3)
p1.1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'bits') +
theme_bw() +
scale_y_continuous(labels = scales::percent)
p1.1
ggsave(p1.1, filename = 'figures/1.1.png', width = 5, height = 3)
data:image/s3,"s3://crabby-images/17b0b/17b0b084ccededd727c59e09fb665d1dc0e9accd" alt=""
氨基酸序列
# 氨基酸序列
seq_aa = read.table('data/test.AA.seq.txt', header = T)
p2 = ggseqlogo(as.character(seq_aa$.), method = 'prob') +
theme_bw() +
scale_y_continuous(labels = scales::percent)
p2
ggsave(p2, filename = 'figures/2.png', width = 5, height = 3)
data:image/s3,"s3://crabby-images/e3b9e/e3b9e8db9de7aef1dd99f0ee5d855f1b9348c664" alt=""
自定义数据
ggseqlogo
支持自定义数据,如数字。
# 自定义序列
seq_diy = matrix(ncol = 1, nrow = 10) %>%
as.data.frame()
for (i in 1:nrow(seq_diy)) {
seq.temp = as.character(sample(1:4,10, replace = T))
seq.temp.2 = seq.temp[1]
for (j in 2:10) {
seq.temp.2 = paste(seq.temp.2, seq.temp[j], sep = '')
}
seq_diy[i,] = seq.temp.2
}
colnames(seq_diy) = 'test.seq'
p4 = ggseqlogo(as.character(seq_diy$test.seq),
method = 'prob',
namespace=1:4) +
theme_bw() +
scale_y_continuous(labels = scales::percent)
p4
ggsave(p4, filename = 'figures/4.png', width = 5, height = 3)
data:image/s3,"s3://crabby-images/fc68b/fc68b9fdffdd9b3349f04a3cc8d1fafff6abddad" alt=""
矩阵类型数据
另外一种使用得更多的数据应该是类似这样的:
data:image/s3,"s3://crabby-images/5229b/5229b9365f1b342cf215da939ba481b3c49a39b5" alt=""
# matrix数据
seq_matrix = read.table('data/test.matrix.txt', header = T) %>%
as.matrix()
p3 = ggseqlogo(seq_matrix, method = 'bits') +
theme_bw()
p3
ggsave(p3, filename = 'figures/3.png', width = 5, height = 3)
data:image/s3,"s3://crabby-images/2f802/2f802b5ed557dcf1d9d584f85f46aa722bfd40d1" alt=""
更多可视化方法参照作者教程网站:https://omarwagih.github.io/ggseqlogo/。
参考文献
[1] Li, Ying, et al. "Magnaporthe oryzae Auxiliary Activity Protein MoAa91 Functions as Chitin-Binding Protein To Induce Appressorium Formation on Artificial Inductive Surfaces and Suppress Plant Immunity." Mbio 11.2 (2020).
[2] Wagih, Omar. "ggseqlogo: a versatile R package for drawing sequence logos." Bioinformatics 33.22 (2017): 3645-3647.
网友评论