单细胞分析实录(8): 展示marker基因的4种图形（一）

作者: TOP生物信息 | 来源:发表于2021-01-08 22:28 被阅读0次

单细胞分析实录(8): 展示marker基因的4种图形（一）
单细胞分析实录(9): 展示marker基因的4种图形（二）
这种Dotplot展示单细胞基因表达
最新8+单细胞测序结合bulk测序纯生信。小编教你如何模仿并超越
10X单细胞云工具 | Marker基因可视化（附视频）
又见8+基于单细胞marker基因的纯生信文章，仍然可以模仿并超
最新6.5分SCI，单细胞识别marker构建模型预测免疫治疗疗
单细胞转录组数据分析课件||8. Differential ex
03.两组比较_差异基因数目展示
UMAP图上同时展示多个marker基因

今天的内容讲讲单细胞文章中经常出现的展示细胞marker的图：tsne/umap图、热图、堆叠小提琴图、气泡图，每个图我都会用两种方法绘制。

使用的数据来自文献：Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. 去年7月发表在Cell Research上的关于鼻咽癌的文章，数据下载：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150430

髓系细胞数量相对少一些，为了方便演示，选它作为例子。

library(Seurat)
library(tidyverse)
library(harmony)
Myeloid=read.table("Myeloid_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
Myeloid_anno=read.table("Myeloid_anno.txt",header = T,sep = "\t",stringsAsFactors = F)

导入数据的时候需要注意一个地方：从cell ranger得到的矩阵，每一列的列名会在CB后面加上"-1"这个字符串，在R里面导入数据时，会自动转化为".1"，在做匹配的时候需要注意一下。我已经提前转换为"_1"

> summary(as.numeric(Myeloid["CD14",]))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.000   0.000   0.689   1.080   2.111   4.500
> summary(as.numeric(Myeloid["PTPRC",]))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.000   1.200   1.681   1.607   2.124   3.520

考虑到下载的表达矩阵，表达值都不是整数，且大于0，推测该矩阵已经经过了标准化，因此下面的流程会跳过这一步

0. Seurat流程

我们直接把注释结果赋值到mye.seu@meta.data矩阵中，后面省去聚类这一步

mye.seu=CreateSeuratObject(Myeloid)
mye.seu@meta.data$CB=rownames(mye.seu@meta.data) 
mye.seu@meta.data=merge(mye.seu@meta.data,Myeloid_anno,by="CB")
rownames(mye.seu@meta.data)=mye.seu@meta.data$CB

#替代LogNormalize这一步
mye.seu[["RNA"]]@data=mye.seu[["RNA"]]@counts
mye.seu <- FindVariableFeatures(mye.seu, selection.method = "vst", nfeatures = 2000)
mye.seu <- ScaleData(mye.seu, features = rownames(mye.seu))

mye.seu <- RunPCA(mye.seu, npcs = 50, verbose = FALSE)
mye.seu=mye.seu %>% RunHarmony("sample", plot_convergence = TRUE)
mye.seu <- RunUMAP(mye.seu, reduction = "harmony", dims = 1:20)
mye.seu <- RunTSNE(mye.seu, reduction = "harmony", dims = 1:20)
#少了聚类

DimPlot(mye.seu, reduction = "tsne", group.by = "celltype", pt.size=1)+theme(
  axis.line = element_blank(),
  axis.ticks = element_blank(),axis.text = element_blank()
)
ggsave("tsne1.pdf",device = "pdf",width = 17,height = 14,units = "cm")

基本符合原图，三个亚群分得开，三个亚群分不开。

接下来，我用4种方式展示marker基因，这些基因可以在文献的补充材料里面找到。

1. tsne展示marker基因

FeaturePlot(mye.seu,features = "CCR7",reduction = "tsne",pt.size = 1)+
  scale_x_continuous("")+scale_y_continuous("")+
  theme_bw()+ #改变ggplot2的主题
  theme( #进一步修改主题
    panel.grid.major = element_blank(),panel.grid.minor = element_blank(), #去掉背景线
    axis.ticks = element_blank(),axis.text = element_blank(), #去掉坐标轴刻度和数字
    legend.position = "none", #去掉图例
    plot.title = element_text(hjust = 0.5,size=14) #改变标题位置和字体大小
  )
ggsave("CCR7.pdf",device = "pdf",width = 10,height = 10.5,units = "cm")

另一种方法就是把tsne的坐标和基因的表达值提取出来，用ggplot2画，其实不是很必要，因为FeaturePlot也是基于ggplot2的，我还是演示一下

mat1=as.data.frame(mye.seu[["RNA"]]@data["CCR7",])
colnames(mat1)="exp"
mat2=Embeddings(mye.seu,"tsne")
mat3=merge(mat2,mat1,by="row.names")

#数据格式如下：
> head(mat3)
Row.names     tSNE_1     tSNE_2   exp
1 N01_AAACGGGCATTTCAGG_1   5.098727  32.748145 0.000
2 N01_AAAGATGCAATGTAAG_1 -24.394040  26.176422 0.000
3 N01_AACTCAGGTAATAGCA_1  11.856730   8.086553 0.000
4 N01_AACTCAGGTCTTCGTC_1  10.421878  12.660407 0.000
5 N01_AACTTTCAGGCCATAG_1  33.555756 -10.437406 1.606
6 N01_AAGACCTTCGAATGGG_1 -23.976967  11.897753 0.738

mat3%>%ggplot(aes(tSNE_1,tSNE_2))+geom_point(aes(color=exp))+
  scale_color_gradient(low = "grey",high = "purple")+theme_bw()
ggsave("CCR7.2.pdf",device = "pdf",width = 13.5,height = 12,units = "cm")

用ggplot2的好处就是图形修改很方便，毕竟ggplot2大家都很熟悉

2. 热图展示marker基因

画图前，需要给每个细胞一个身份，因为我们跳过了聚类这一步，此处需要手动赋值

Idents(mye.seu)="celltype"

library(xlsx)
markerdf1=read.xlsx("ref_marker.xlsx",sheetIndex = 1)
markerdf1$gene=as.character(markerdf1$gene)
# 这个表格整理自原文的附表，选了53个基因

#数据格式
# > head(markerdf1)
# gene   celltype
# 1    S100B DC2(CD1C+)
# 2 HLA-DQB2 DC2(CD1C+)
# 3   FCER1A DC2(CD1C+)
# 4     CD1A DC2(CD1C+)
# 5     PKIB DC2(CD1C+)
# 6    NDRG2 DC2(CD1C+)

DoHeatmap(mye.seu,features = markerdf1$gene,label = F,slot = "scale.data")
ggsave("heatmap.pdf",device = "pdf",width = 23,height = 16,units = "cm")

label = F不在热图的上方标注细胞类型，
slot = "scale.data"使用scale之后的矩阵画图，默认就是这个

接下来用pheatmap画，在布局上可以自由发挥

library(pheatmap)
colanno=mye.seu@meta.data[,c("CB","celltype")]
colanno=colanno%>%arrange(celltype)
rownames(colanno)=colanno$CB
colanno$CB=NULL
colanno$celltype=factor(colanno$celltype,levels = unique(colanno$celltype))

先对细胞进行排序，按照celltype的顺序，然后对基因排序

rowanno=markerdf1
rowanno=rowanno%>%arrange(celltype)

提取scale矩阵的行列时，按照上面的顺序

mat4=mye.seu[["RNA"]]@scale.data[rowanno$gene,rownames(colanno)]
mat4[mat4>=2.5]=2.5
mat4[mat4 < (-1.5)]= -1.5 #小于负数时，加括号！

下面就是绘图代码了，我加了分界线，使其看上去更有区分度

pheatmap(mat4,cluster_rows = F,cluster_cols = F,
         show_colnames = F,
         annotation_col = colanno,
         gaps_row=as.numeric(cumsum(table(rowanno$celltype))[-6]),
         gaps_col=as.numeric(cumsum(table(colanno$celltype))[-6]),
         filename="heatmap.2.pdf",width=11,height = 7
         )