harmony是一个非常流行的整合单细胞数据的算法,由于它的核心是用C++写的所以相比其他的类似算法快很多,如果单细胞样本量很大的时候推荐使用。
首先加载其文章中用到的一个示例数据,为三个人10X pbmc数据,分别为5'端测序,3'端测序V1,3'端测序V2。
exprs_raw <- list(
'./harmony/data/fiveprime_raw.rds',
'./harmony/data/threepv2_raw.rds',
'./harmony/data/threepv1_raw.rds'
) %>%
lapply(readRDS) %>%
purrr::reduce(Matrix::cbind2)
首先简单跑一遍Seurat并查看批次差异。可以看到有很明显的差异。
scRNA_harmony <- CreateSeuratObject(counts = exprs_raw, project = "pbmc harmony", min.cells = 3, min.features = 200)
scRNA_harmony@meta.data$group <- scRNA_harmony@meta.data$orig.ident
scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% ScaleData() %>% RunPCA(verbose=FALSE)
p1 <- DimPlot(object = scRNA_harmony, dims=c(1,2), reduction = "pca", pt.size = .1, group.by = "orig.ident")
p2 <- VlnPlot(object = scRNA_harmony, features = c("PC_1", "PC_2"), group.by = "orig.ident", pt.size = .1)
plot_grid(p1,p2)
plot_zoom_png.png
跑RunHarmony并检查整合结果。可以看到三组数据很好的整合在一起了。
scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident", theta = 1, plot_convergence = TRUE, dims.use=1:30)
p1 <- DimPlot(object = scRNA_harmony, reduction = "harmony", pt.size = .1, group.by = "orig.ident")
p2 <- VlnPlot(object = scRNA_harmony, features = c("harmony_1","harmony_2"), group.by = "orig.ident", pt.size = .1)
plot_grid(p1,p2)
plot_zoom_png.png
接下来就可以根据新的数据共享降维空间(dimension reduced shared space)来对数据进行下游分析了,此处省略。比较感兴趣是文章中提到的评估整合效果的metric iLISI(integration local inverse Simpson’s Index),原理示意图如下:
Screen Shot 2021-02-01 at 12.00.48 PM.png简单来说就是对每个细胞计算其邻近细胞(nearest neighbours)是否来自于同一个数据集,是的话其值为1,反之其值大于1。下面我们来计算一下。
ix <- match(rownames(scRNA_harmony[[]]), meta_data$cell_id)
meta_data <- meta_data[ix,]
lisi_res <- rbind(
lisi::compute_lisi(Embeddings(scRNA_harmony,reduction = "harmony"), meta_data, c('donor', 'res_0.80')) %>%
dplyr::mutate(type = 'harmony'),
lisi::compute_lisi(Embeddings(scRNA_harmony,reduction = "pca"), meta_data, c('donor', 'res_0.80')) %>%
dplyr::mutate(type = 'pca')
) %>%
tidyr::gather(key, val, res_0.80, donor)
lisi_res %>%
ggplot(aes(val)) +
geom_density() +
facet_wrap(type ~ key, scales = 'free') +
theme_tufte(base_size = 20) +
xlim(1, 3)+
theme_grey()
可以看出整合前(PCA做聚类),iLISI值基本都在1,而整合后(harmony做聚类)明显增大且峰值在2。
plot_zoom_png.pngReference:
https://www.nature.com/articles/s41592-019-0619-0#MOESM5
https://github.com/immunogenomics/harmony
网友评论