考虑空间位置的通讯分析手段---CellphoneDB（V3.0

作者: 单细胞空间交响乐 | 来源:发表于2022-01-24 10:19 被阅读0次

考虑空间位置的通讯分析手段---CellphoneDB（V3.0
【单细胞高级绘图】08.细胞通讯_两组比较_气泡图
小鼠和人的同源基因对应
单细胞分析之细胞交互-3：CellChat
单细胞分析之细胞交互-2：iTALK
单细胞分析之细胞交互-1：CellphoneDB
单细胞分析之细胞交互-5：NicheNet多组间互作比较
单细胞分析之细胞交互-7：目标基因集的配体和靶基因活性预测分析
单细胞分析之细胞交互-6：NicheNet使用的常见问题汇总
单细胞分析实录(19): 基于CellPhoneDB的细胞通讯分

新的一周，新的开始，今天我们来分享一个cellphoneDB的升级版，CellphoneDB的V3.0版本，升级的最大改进在于服务于空间转录组的通讯分析，会结合空间位置和生态位进行有效通讯的识别和判断，我们来看一下。

关于CellphoneDB，大家应该都不陌生，目前做通讯分析引用率最高的软件，之前的版本升级到2之前都是服务于单细胞的数据分析的，但是现在作者将软件升级到3，就是将空间转录组纳入分析，非常棒的想法和运用，参考文章在Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro,2021年12月份发表于Nature Genetics，IF38.330，文献的内容我们下一篇再分享，这一片我们专注于CellphoneDB（V3.0）。

CellphoneDB（V3.0）的创新点

合并空间信息 CellPhoneDB 现在允许通过微环境文件合并细胞的空间信息。这是一个两列文件，指示哪种细胞类型在哪个空间微环境中（参见示例见下图）。 CellphoneDB 将使用此信息来定义可能的交互细胞对（即在微环境中共享/共存的集群对）。可以使用 cell2location（关于cell2location分享了多次，大家可以参考文章 10X单细胞和空间联合分析的方法---cell2location,10X单细胞空间联合分析之再次解读cell2location） 定义具有先验知识、成像或 Visium 分析的微环境。

cell_type	microenviroment
epi_Ciliated	Proliferative
epi_Pre-ciliated	Proliferative
epi_SOX9_LGR5	Proliferative
epi_SOX9_prolif	Proliferative
epi_SOX9	Proliferative
Fibroblast eS	Proliferative
Lymphoid	Proliferative
Myeloid	Proliferative
Fibroblast C7	Proliferative
epi_Ciliated	Secretory
epi_Lumenal 1	Secretory
epi_Lumenal 2	Secretory
epi_Glandular	Secretory
epi_Glandular_secretory	Secretory
Fibroblast dS	Secretory
Lymphoid	Secretory
epi_SOX9	Secretory
Fibroblast C7	Secretory
Myeloid	Secretory

添加了新的分析方法，使用差异表达基因 (DEG) 而不是 random shuffling（cellphonedb 方法 degs_analysis）。 这种方法将选择所有基因都由高于--阈值的一小部分细胞表达并且至少一个基因是 DEG 的相互作用。可以使用喜欢的工具识别 DEG，并通过文本文件将信息提供给 CellphoneDB。第一列应该是细胞类型/cluster，第二列应该是相关的基因 id。其余列将被忽略（参见示例见下表）。这里提供了为 Seurat 和 Scanpy 的参考示例。

图片.png

Database update WNT pathway has been further curated.

图片.png

先来看看R版本的分析（联合Seurat）

library(Seurat)
library(SeuratObject)
library(Matrix)
so = readRDS('endometrium_example_counts_seurat.rds')
writeMM(so@assays$RNA@counts, file = 'endometrium_example_counts_mtx/matrix.mtx')
# save gene and cell names
write(x = rownames(so@assays$RNA@counts), file = "endometrium_example_counts_mtx/genes.tsv")
write(x = colnames(so@assays$RNA@counts), file = "endometrium_example_counts_mtx/barcodes.tsv")

so@meta.data$Cell = rownames(so@meta.data)
df = so@meta.data[, c('Cell', 'cell_type')]
write.table(df, file ='endometrium_example_meta.tsv', sep = '\t', quote = F, row.names = F)


## OPTION 1 - compute DEGs for all cell types
## Extract DEGs for each cell_type
# DEGs <- FindAllMarkers(so, 
#                        test.use = 'LR', 
#                        verbose = F, 
#                        only.pos = T, 
#                        random.seed = 1, 
#                        logfc.threshold = 0.2, 
#                        min.pct = 0.1, 
#                        return.thresh = 0.05)


# OPTION 2 - optional - Re-compute  hierarchical (per lineage) DEGs for Epithelial and Stromal lineages
DEGs = c()
for( lin in c('Epithelial', 'Stromal') ){
    message('Computing DEGs within linage ', lin)
    so_in_lineage = subset(so, cells = Cells(so)[ so$lineage == lin ] )
    celltye_in_lineage = unique(so$cell_type[ so$lineage == lin ])
    DEGs_lin = FindAllMarkers(so_in_lineage,
                       verbose = F, 
                       only.pos = T, 
                       random.seed = 1, 
                       logfc.threshold = 0, 
                       min.pct = 0.1, 
                       return.thresh = 0.05)
    DEGs = rbind(DEGs_lin, DEGs)
}


fDEGs = subset(DEGs, p_val_adj < 0.05 & avg_logFC > 0.1)

# 1st column = cluster; 2nd column = gene 
fDEGs = fDEGs[, c('cluster', 'gene', 'p_val_adj', 'p_val', 'avg_logFC', 'pct.1', 'pct.2')] 
write.table(fDEGs, file ='endometrium_example_DEGs.tsv', sep = '\t', quote = F, row.names = F)

Run cellphoneDB

cellphonedb method degs_analysis  \
    endometrium_example_meta.tsv  \
    endometrium_example_counts_mtx  \
    endometrium_example_DEGs.tsv  \
    --microenvs endometrium_example_microenviroments.tsv  \ #optional
    --counts-data hgnc_symbol  \
    --database database/database/cellphonedb_user_2021-06-29-11_41.db \
    --threshold 0.1

python版本的分析

import numpy as np
import pandas as pd
import scanpy as sc
import anndata
import os
import sys
from scipy import sparse

sc.settings.verbosity = 1  # verbosity: errors (0), warnings (1), info (2), hints (3)
sys.executable
adata = sc.read('endometrium_example_counts.h5ad')
df_meta = pd.DataFrame(data={'Cell':list(adata.obs.index),
                             'cell_type':[ i for i in adata.obs['cell_type']]
                            })
df_meta.set_index('Cell', inplace=True)
df_meta.to_csv('endometrium_example_meta.tsv', sep = '\t')

# Conver to dense matrix for Seurat
adata.X = adata.X.toarray()

import rpy2.rinterface_lib.callbacks
import logging
# Ignore R warning messages
#Note: this can be commented out to get more verbose R output
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR)
import anndata2ri
anndata2ri.activate()
%load_ext rpy2.ipython

%%R -o DEGs

library(Seurat)
so = as.Seurat(adata, counts = "X", data = "X")
Idents(so) = so$cell_type

## OPTION 1 - compute DEGs for all cell types
## Extract DEGs for each cell_type
# DEGs <- FindAllMarkers(so, 
#                        test.use = 'LR', 
#                        verbose = F, 
#                        only.pos = T, 
#                        random.seed = 1, 
#                        logfc.threshold = 0.2, 
#                        min.pct = 0.1, 
#                        return.thresh = 0.05)


# OPTION 2 - optional - Re-compute  hierarchical (per lineage) DEGs for Epithelial and Stromal lineages
DEGs = c()
for( lin in c('Epithelial', 'Stromal') ){
    message('Computing DEGs within linage ', lin)
    so_in_lineage = subset(so, cells = Cells(so)[ so$lineage == lin ] )
    celltye_in_lineage = unique(so$cell_type[ so$lineage == lin ])
    DEGs_lin = FindAllMarkers(so_in_lineage, 
                       test.use = 'LR', 
                       verbose = F, 
                       only.pos = T, 
                       random.seed = 1, 
                       logfc.threshold = 0.2, 
                       min.pct = 0.1, 
                       return.thresh = 0.05)
    DEGs = rbind(DEGs_lin, DEGs)
}

cond1 = DEGs['p_val_adj'] < 0.05 
cond2 = DEGs['avg_log2FC'] > 0.1
mask = [all(tup) for tup in zip(cond1, cond2)]
fDEGs = DEGs[mask]


# 1st column = cluster; 2nd column = gene 
fDEGs = fDEGs[['cluster', 'gene', 'p_val_adj', 'p_val', 'avg_log2FC', 'pct.1', 'pct.2']] 
fDEGs.to_csv('endometrium_example_DEGs.tsv', index=False, sep='\t')

Run cellphoneDB

cellphonedb method degs_analysis  \
    endometrium_example_meta.tsv  \
    endometrium_example_counts.h5ad  \
    endometrium_example_DEGs.tsv  \
    --microenvs endometrium_example_microenviroments.tsv  \
    --counts-data hgnc_symbol  \
    --database database/database/cellphonedb_user_2021-06-29-11_41.db \
    --threshold 0.1