Seurat Weekly NO.10 || 整合数据差异分析到

作者: 周运来就是我 | 来源:发表于2021-02-06 06:56 被阅读0次

Seurat Weekly NO.10 || 整合数据差异分析到
单细胞笔记7-scRNA-seq去除批次效应
sc-RAN-seq 数据分析||Seurat新版教程: Int
Seurat Weekly NO.2 || 我该如何取子集？
sctransform预处理后，如何进行差异表达分析
单细胞笔记6-Seurat v4新特性
单细胞实战2: Seurat+SingleR--从矩阵到细胞类型
Harmony整合单细胞数
Seurat中FindMarker寻找两个cell type差异
空间转录组第七讲：多样本合并、marker基因分析

有一点提示下，这几乎是语义上的：“整合”。Seurat V3 一度被认为是整合（Integrate，CCA+MNN）不同RNA数据集的标杆工具，在其文章Comprehensive Integration of Single-Cell Data中提到：Seurat v3引入了集成多个单细胞数据集的新方法。这些方法的目的是识别存在于不同数据集的共享的细胞状态，即使它们是从不同的个体、实验条件、技术平台甚至物种，用到的函数是FindIntegrationAnchors。业内有不少拿它和去批次的工具在一起做benchmark，其实这不是一回事。强调，整合与批次不是一回事。在V4 中整合不同的RNA数据集你依然可以用‘FindIntegrationAnchors’。在V4的WNN中也有一个“整合”，这里的整合多为多模态数据之间的整合，用到的函数FindMultiModalNeighbors。可见，这个函数在v3中对应的位置应该是FindNeighbors，即构建细胞间的图结构用的部分。

然后，“整合”也不同于“合并”（merge）,合并一般是在整合的前面，先把不同的dataset合并到一起看数据的最初概览，以判断需不需要整合或其他。整合这个概念是单细胞数据分析中继降维之后第二个容易语义混淆的概念。

问题：

After integration, which Assay should I use for differential expression testing?

首先，做差异分析用到的数据是integration之前的RNA

We recommend running your differential expression tests on the original / unintegrated data. By default this is stored in the RNA Assay. The integration procedure inherently introduces dependencies between data points. This violates the assumptions of the statistical tests used for differential expression.

其次，这里注意区分 batch和 condition，太多的项目在纠结batch effect，是因为没有搞清楚batch 的字面意思，而把condition当成了batch。不管怎么样，如果有变量想要在差异分析中回归掉，可以用下面的方法。

If you are concerned about additional confounders in the data such asbatch or condition, these can be supplied to certain differential expression tests such as the logistic regression test(test.use = “LR” in FindMarkers) via the latent.vars parameter.

#此处仅为测试函数用，没有实际意义。
library(Seurat)
library(ggraph,lib.loc = 'F:\\EE\\software\\R\\R-4.0.2\\library')
library(pbmc3k.SeuratData,lib.loc = 'F:\\EE\\software\\R\\R-4.0.2\\library')       
.libPaths()
library(tidyverse)
library(cowplot)
library(clustree,lib.loc = 'F:\\EE\\software\\R\\R-4.0.2\\library')
pbmc3k.final <- FindClusters(pbmc3k.final,dims=1:20,resolution = seq(from=0,by=.2,length=10))

FindMarkers(pbmc3k.final,ident.1 =6,ident.2 = 8,group.by = 'RNA_snn_res.1.8' ) %>% head()
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s  
              p_val avg_logFC pct.1 pct.2    p_val_adj
AIF1   6.303645e-58  3.441023     1 0.092 8.644818e-54
LST1   4.517026e-57  3.324437     1 0.144 6.194649e-53
CST3   1.693436e-56  3.243294     1 0.183 2.322379e-52
FCER1G 1.844418e-55  2.767651     1 0.170 2.529435e-51
TYROBP 4.633845e-55  2.722533     1 0.216 6.354855e-51
CTSS   4.162577e-53  2.336520     1 0.255 5.708558e-49
> FindMarkers(pbmc3k.final,ident.1 =6,ident.2 = 8,group.by = 'RNA_snn_res.1.8',test.use = 'LR',latent.vars = 'seurat_annotations' ) %>% head()
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=25s  
       p_val avg_logFC pct.1 pct.2 p_val_adj
AIF1       1  3.441023 1.000 0.092         1
LST1       1  3.324437 1.000 0.144         1
CST3       1  3.243294 1.000 0.183         1
FCER1G     1  2.767651 1.000 0.170         1
TYROBP     1  2.722533 1.000 0.216         1
FCGR3A     1  2.661064 0.975 0.124         1
There were 50 or more warnings (use warnings() to see the first 50)

最后，是建议部分：