引言:因为在尝试单细胞转录组分析流程,学习了Seurat的数据分析步骤,记录主要的分析过程,及一些心得。
Seurat 工具包
上手指南和多数据集整合分析手册Seurat标准分析步骤
主要步骤
- 导入单细胞数据集,创建Seurat对象
- Seurat过滤,设置阈值去除技术误差,选择恰当的单细胞表达数据
- 数据标准化(标准化方法:LogNormalize)
- 特征选择,根据算法选择最能反映整体数据结果和信息的基因。
- 归一化(一方面使得细胞间的平均表达量为0,另一方面使得细胞间的差异为1)
- 数据降维,基于归一化后的数据,运行非线性降维算法PCA
- 细胞聚类,根据表达相似性划分细胞群。实现方法是根据前面的主成分分析选择的PC,基于图论方法进行聚类
- 降维结果可视化,方法包括PCA, t-SNE, UMAP
- 寻找细胞簇特征表达基因
- 注释细胞簇
根据已发表的文章,以上分析中需要深入考虑的步骤包括:选择的特征基因数量、降维的主成分数量,及降维可视化的UMAP的分辨率选择
主要分析步骤代码
##1. 创建数据集,10x Genomics 流程
pbmc.data <- Read10X(data.dir = "rawData8k/")
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "ath8k",min.cells = 3,min.features = 200)
## 另一种方式, .csv 存储的表达矩阵
rawdata6 <- read.table("./GSE123818/Root_single_cell_shr_datamatrix.csv",header = T,row.names = 1,sep = ",")
ara6 <- CreateSeuratObject(counts = rawdata6,project = "GSE123818_shr"min.cells = 3,min.features = 200)
##2. 数据筛选,每个细胞表达基因的阈值
pbmc <- subset(pbmc, subset = nFeature_RNA > 500 & nFeature_RNA < 5000 & percent.mt < 5)
##3. 标准化、特征选择及归一化
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", mean.cutoff = c(0.125,3), dispersion.cutoff = c(1.5,Inf))
pbmc <- ScaleData(pbmc)
## 一个函数整合以上三步
pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
##4. PCA降维
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc)) # 根据特征选择得到的基因
##5. 细胞聚类
pbmc <- FindNeighbors(pbmc, dims = 1:50) # dims 根据PCA结果选择
pbmc <- FindClusters(pbmc, resolution = 1) # 分辨率越高,类群划分的越细
##6. 可视化的非线性降维
pbmc <- RunTSNE(pbmc,dims = 1:50)
pbmc <- RunUMAP(pbmc, dims = 1:50,umap.method = "umap-learn",metric = "correlation")
#pbmc <- RunUMAP(pbmc, dims = 1:50,umap.method = "uwot",metric = "cosine")
DimPlot(pbmc, reduction = "umap",label = T) # 可选参数, umap, tsne, pca
##7. 选择marker gene
cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25, logfc.threshold = 0.58)
##8. 注释细胞
anno.cluster = c("Columella Cortex Atrichoblast Phloem Trichoblast Endodermis 7 lateral-root M 10 lateral-root-cap protoxylem Phloem 14 Protophloem")
new.cluster.ids <- unlist(strsplit(anno.cluster,split = " "))
names(new.cluster.ids) <- levels(ara_integrated)
ara_integrated <- RenameIdents(ara_integrated, new.cluster.ids)
主要图形可视化代码
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA"),ncol = 2)
FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
DimPlot(pbmc, reduction = "pca",label = T)
VlnPlot(pbmc, features = c("AT3G53980", "AT2G30860"))
DoHeatmap(pbmc, features = top20$gene,label = T)
其他数据结构查看代码
### 主要函数
head(pbmc[[]]) # 查看meta.data 数据结构
levels(pbmc) # 聚类水平
SeuratData
library(SeuratData)
library(Seurat)
### download satijalab-seurat-data-v0.2.1-0-g12c77a8.tar.gz and then install
InstallData("pbmc3k")
data(pbmc3k)
pbmc3k
Seurat 3.1 :UMAP的两种实现方法
https://scrnaseq-course.cog.sanger.ac.uk/website/seurat-chapter.html
SeuratData
网友评论