欢迎关注Bioinfor 生信云微信公众号!
读取三张表
library(tidyverse)
library(readr)
gene_info <- read_delim("MM_6js01hvu.emapper.annotations.tsv",
delim = "\t", escape_double = FALSE,
col_names = FALSE, comment = "#", trim_ws = TRUE,row.names = 1) %>%
select(ID = X1,
GO = X10,
Ko = X12,
pathway = X13,
Gene_name = X9)
gene_exp <- read.table('genes.TMM.EXPR.matrix', header=T, row.names = 1)
sample_info <- read.table(file = 'sample.txt', sep = "\t", header=T, row.names = 1)
样本相关性
相关性分析correlation
R语言的cor函数,可以计算变量之间的相关系数
#计算距离
sample_cor <- cor(gene_exp)
sample_cor1 <- round(sample_cor, digits = 2)
#画图
library(pheatmap)
pheatmap(sample_cor1, display_numbers = T,fontsize = 10, angle_col = 45)
聚类树状图
sample_dist <- dist(t(gene_exp))
sample_hc <- hclust(sample_dist)
plot(sample_hc)
PCA
library(PCAtools)p <- pca(gene_exp, metadata = sample_info, removeVar = 0.1)
pca_loadings <- p$loadings #某基因对pc1\pc2\pc3\pc4的贡献
pca_rotated <- p$rotated #每个主成分与样本之间的关系
screeplot(p) #主成分对样本差异的解释度
biplot(p,
x = 'PC1',
y = 'PC2',
colby = 'group',
shape = 'shape',
legendPosition = 'right')
数据可以保存在rdata格式的文件中,下次直接用load()函数导入使用。
网友评论