- Seurat Weekly NO.0 || 开刊词
- Seurat Weekly NO.1 || 到底分多少个群是合适的?!
- Seurat Weekly NO.2 || 我该如何取子集
- Seurat Weekly NO.3 || 直接用Seurat画fig2
- Seurat Weekly NO.4 || 高效数据管理
- Seurat Weekly NO.5 pseudocell该如何计算||或谈Seurat的扩展
- Seurat Weekly NO.06 || 数据对象转化之Scanpy2Seurat
- Seurat Weekly NO.07 || V4 新特性
- Seurat Weekly NO.08 || Seurat 交互系统
- Seurat Weekly NO.09 ||UMAP图分不开怎么办?
有一点提示下,这几乎是语义上的:“整合”。Seurat V3 一度被认为是整合(Integrate,CCA+MNN)不同RNA数据集的标杆工具,在其文章Comprehensive Integration of Single-Cell Data中提到:Seurat v3引入了集成多个单细胞数据集的新方法。这些方法的目的是识别存在于不同数据集的共享的细胞状态,即使它们是从不同的个体、实验条件、技术平台甚至物种,用到的函数是FindIntegrationAnchors。业内有不少拿它和去批次的工具在一起做benchmark,其实这不是一回事。强调,整合与批次不是一回事。在V4 中整合不同的RNA数据集你依然可以用‘FindIntegrationAnchors’。在V4的WNN中也有一个“整合”,这里的整合多为多模态数据之间的整合,用到的函数FindMultiModalNeighbors。可见,这个函数在v3中对应的位置应该是FindNeighbors,即构建细胞间的图结构用的部分。
然后,“整合”也不同于“合并”(merge),合并一般是在整合的前面,先把不同的dataset合并到一起看数据的最初概览,以判断需不需要整合或其他。整合这个概念是单细胞数据分析中继降维之后第二个容易语义混淆的概念。
问题:
After integration, which Assay should I use for differential expression testing?
首先,做差异分析用到的数据是integration之前的RNA
- We recommend running your differential expression tests on the
original / unintegrated
data. By default this is stored in theRNA
Assay. The integration procedure inherently introduces dependencies between data points. This violates the assumptions of the statistical tests used for differential expression.
其次,这里注意区分 batch和 condition,太多的项目在纠结batch effect,是因为没有搞清楚batch 的字面意思,而把condition当成了batch。不管怎么样,如果有变量想要在差异分析中回归掉,可以用下面的方法。
- If you are concerned about additional confounders in the data such as
batch or condition
, these can be supplied to certain differential expression tests such as thelogistic regression test
(test.use = “LR” in FindMarkers) via the latent.vars parameter.
#此处仅为测试函数用,没有实际意义。
library(Seurat)
library(ggraph,lib.loc = 'F:\\EE\\software\\R\\R-4.0.2\\library')
library(pbmc3k.SeuratData,lib.loc = 'F:\\EE\\software\\R\\R-4.0.2\\library')
.libPaths()
library(tidyverse)
library(cowplot)
library(clustree,lib.loc = 'F:\\EE\\software\\R\\R-4.0.2\\library')
pbmc3k.final <- FindClusters(pbmc3k.final,dims=1:20,resolution = seq(from=0,by=.2,length=10))
FindMarkers(pbmc3k.final,ident.1 =6,ident.2 = 8,group.by = 'RNA_snn_res.1.8' ) %>% head()
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s
p_val avg_logFC pct.1 pct.2 p_val_adj
AIF1 6.303645e-58 3.441023 1 0.092 8.644818e-54
LST1 4.517026e-57 3.324437 1 0.144 6.194649e-53
CST3 1.693436e-56 3.243294 1 0.183 2.322379e-52
FCER1G 1.844418e-55 2.767651 1 0.170 2.529435e-51
TYROBP 4.633845e-55 2.722533 1 0.216 6.354855e-51
CTSS 4.162577e-53 2.336520 1 0.255 5.708558e-49
> FindMarkers(pbmc3k.final,ident.1 =6,ident.2 = 8,group.by = 'RNA_snn_res.1.8',test.use = 'LR',latent.vars = 'seurat_annotations' ) %>% head()
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=25s
p_val avg_logFC pct.1 pct.2 p_val_adj
AIF1 1 3.441023 1.000 0.092 1
LST1 1 3.324437 1.000 0.144 1
CST3 1 3.243294 1.000 0.183 1
FCER1G 1 2.767651 1.000 0.170 1
TYROBP 1 2.722533 1.000 0.216 1
FCGR3A 1 2.661064 0.975 0.124 1
There were 50 or more warnings (use warnings() to see the first 50)
最后,是建议部分:
- We recommend this for each of the integration workflows (i.e. RPCA, sctransform, reference-based) implemented in Seurat.
https://github.com/satijalab/seurat/discussions/4000#discussioncomment-326221
网友评论