hello,大家好,生信一线员工这次给大家再分享一个细胞通讯的软件CrosstalkR,一样的道理,我们先来看文献,最后分享一下代码,文章在CrossTalkeR: Analysis and Visualisation of Ligand Receptor Networks,在这里再次重新强调一下,做细胞通讯,不能只关注对比中特异的配受体对,更好看看共有的配受体对通讯强度的变化,之前分享过一个方法也分析了通讯强度的变化,文章在单细胞数据细胞通讯分析软件NATMI。希望引起大家足够的重视,好了,我们开始看看文献。
Abstract
Motivation: Ligand-receptor (LR) analysis allows the characterization of cellular crosstalk from single cell RNA-seq data. However, current LR methods provide limited approaches for prioritization of cell types, ligands or receptors or characterizing changes in crosstalk between two biological conditions.(这里如前所说,配受体特异性和强度变化同等重要)。
Results:CrossTalkeR is a framework for network analysis and visualisation of LR networks. CrossTalkeR identifies relevant ligands, receptors and cell types contributing to changes in cell communication when contrasting two biological states: disease vs. homeostasis. A case study on scRNA-seq of human myeloproliferative neoplasms reinforces the strengths of CrossTalkeR for characterisation of changes in cellular crosstalk in disease state.(这个地方也说的我们通常的分析思路,大家对于这个应该不陌生)。
Introduction
Understanding cellular crosstalk is vital for uncovering molecular mechanisms associated with cell differentiation and disease progression.()。这一部分主要讲了一下几点:
(1)information of cellular proximity and crosstalk is not captured(单细胞技术的缺点)
(2)current LR inference methods usually predicts, hundreds of potential LR pairs for a given scRNA-seq dataset making their interpretation challenging (目前方法上的缺陷,大部分软件是这样的预测配受体)。
(3) most 20 LR methods focus on the analysis of a single scRNA-seq experiment and are not able to characterize changes in cellular crosstalk between pairs of conditions,(这里就体现通讯强度在不同条件下的变化,而不是简简单单知道哪些特异性的配受体)。
(4)CrossTalkeR explores network-based measures to rank cell types, receptors,ligands, cell-receptors and cell-ligand pairs regarding their importance in cellular crosstalk in a single or a pair of biological conditions.(怎么样,分析思路绝对是perfect),并且和cellphoneDB的库可兼容,看来cellphoneDB一哥的地位无可撼动。
Overview and implementation

我们来看看这个软件的几个特点。
1、疾病样本和正常样本是分开分析的。(数据是否整合我们需要进一步看看)
2、配受体的强度仍然采用平均值相乘的方法
3、对于不同state下的配受体,对比配受体强度的变化(上调和下调),及调控网络
4、配受体优先次序的排序,这个很创新
我们详细看看每步的过程。
Network Construction
CrossTalkeR constructs two representations of the LR network.
(1)On the cell-cell interaction (CCI) network, the nodes are defined by each cell-type and the edges are weighted by characteristics of the interactions 。 number of LR pairs and sum of weights of LR pairs。(数量和强度)
(2)On the cell-gene interaction (CGI) network, the nodes represent gene and cell pairs (ligand and sender cell or receptor and receiving cell) and the weights are amsmath igiven by the mean LR expression.(细胞基因的相互网络)。
(3)differential CCI and CGI networks are obtained by subtracting the condition state network, e.g.disease, from the control state network. In this way, the interactions with positive values are up-regulated in the condition and the negative valued interactions are down-regulated on the experiment (disease) state.(对比分析)。
Network-based analysis
(1)Node Ranking approaches:(这部分我们详细翻译一下):
We apply distinct network property methods to rank either the CCI and CGI network. By default, measures take the weights of LR pairs into account.(使用不同的网络属性方法对CCI和CGI网络进行排名。 默认情况下,度量会考虑LR对的权重。 )

不知道大家能不能详细看出来和之前分析方法的不同。
方法:

这部分是解释说明,相对简单。

这个地方说的是基因细胞网络的构建。

这个地方解释CCI网络,和我们通常的网络差别不大。

研究差别倒是很省事😄

这里强调了,数据是在整合的基础上构建网络的,整合方法推荐的是Seurat的(恕我直言,der)。

和上面一样,研究等级。
最后一部分很值得我们注意::
PC analysis of network properties
CrossTalkerR employs PCA as a mean to combine distinct network measures. By default, it provides scores of two first PCs as a mean to rank either gene-cell pairs in GCI networks or cells in CCI networks.(排序的方式居然是PCA),CrossTalkeR also provide importance of variables within the PCs to make then interpretable. We interpret that the first PC capture global interactions, whilethe second PC local interactions. Top up-regulated cell-gene pairs predicted by PC2 include matrix (FN1/MSC, COL1A1/MSC) and TGFB signalling associated genes (TGFB1/Megakaryocyte), as previously discussed.。


我们来看看参考代码
1 - Step Extract data from Seurat Object
require(EWCE)
require(tibble)
require(biomaRt)
require(tidyr)
require(dplyr)
# Basic function to convert mouse to human gene names
data_ <- readRDS('path_to_data')
alldata <- NormalizeData(alldata, normalization.method = "LogNormalize", scale.factor = 10000)
allgenes <- rownames(alldata)
matrix1 <- as.data.frame(alldata@assays$RNA@data)
matrix1 <- matrix1[rowSums(matrix1[,2:dim(matrix1)[2]])!=0,]
### If you are using a mouse data, then its needed to convert the gene names to human orthologs
human = useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useEnsembl("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = rownames(alldata@assays$RNA@data) , mart = mouse, attributesL = c("hgnc_symbol","hgnc_id",'ensembl_gene_id'), martL = human, uniqueRows=T)
print(head(genesV2))
matrix1 <- matrix1[match(genesV2$MGI.symbol,rownames(alldata),nomatch=F),]
matrix1$gene <- genesV2$Gene.stable.ID
#rownames(matrix1) <- genesV2$Gene.stable.ID
### Subseting the matrix
s1 <- grepl('state1',alldata@meta.data$cond)
s2 <- grepl('state2',alldata@meta.data$cond)
s1[match('gene',colnames(matrix1))] <- TRUE
s2[match('gene',colnames(matrix1))] <- TRUE
## Checking the dimensions
print(dim(matrix1[,s1]))
print(dim(matrix1[,s2]))
## If the cluster names are categorical, you will need to convert it to numerical
alldata@meta.data$sub_cell_type <- as.factor(alldata@meta.data$sub_cell_type)
print(levels(alldata@meta.data$sub_cell_type))
levels(alldata@meta.data$sub_cell_type) <- 1:length(levels(alldata@meta.data$sub_cell_type))
print(1:length(levels(alldata@meta.data$sub_cell_type)))
alldata@meta.data$sub_cell_type <- as.numeric(alldata@meta.data$sub_cell_type)
write.table(matrix1[,s1], 's1_filtered_hcount.csv',row.names=T,sep=',')
write.table(matrix1[,s2], 's2_filtered_hcount.csv',row.names=T,sep=',')
metadata <- data.frame(cells=rownames(alldata@meta.data[grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[grepl('state1',alldata@meta.data$stim)])
metadata_s2 <- data.frame(cells=rownames(alldata@meta.data[!grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[!grepl('state1',alldata@meta.data$stim)]) ## Just negate grepl('state1',alldata@meta.data$stim),]
print('Writing Metadata')
write.csv(metadata, 's1_filtered_meta.csv', row.names=FALSE)
write.csv(metadata_tac, 's2_filtered_meta.csv', row.names=FALSE)```
Note that the gene ID needs to be unique, you will need to use an approach to combine multiple mapped orthologs (sum, max or bitwiseor )
Run CellPhoneDB
** The parameters list is available at https://github.com/Teichlab/cellphonedb
Ensembl based ID
#! /bin/bash
mkdir s1 s2 # creating the output folders
cellphonedb method statistical_analysis s1_filtered_meta.csv s1_filtered_hcount.csv --threads 30 --output-path s1/
cellphonedb method statistical_analysis s2_filtered_meta.csv s2_filtered_hcount.csv --threads 30 --output-path s2/
HUGO based ID
#! /bin/bash
mkdir s1 s2 # creating the output folders
cellphonedb method statistical_analysis s1_filtered_meta.csv --counts-data hgnc_symbol s1_filtered_hcount.csv --threads 30 --output-path s1/
cellphonedb method statistical_analysis s2_filtered_meta.csv --counts-data hgnc_symbol s2_filtered_hcount.csv --threads 30 --output-path s2/
Extracting LR
def correct_lr(data):
'''
Invert the RL to LR and R1R2 to r2>r1
'''
import pandas as pd
def swap(a,b): return b,a
data = data.to_dict('index')
for k,v in data.items():
if v['isReceptor_fst'] and v['isReceptor_scn']:
v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
elif v['isReceptor_fst'] and not v['isReceptor_scn']:
v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
res_df = pd.DataFrame.from_dict(data,orient='index')
return (res_df)
def cpdb2df(data,clsmapping):
data = data.fillna(0)
df_data = {}
df_data['Ligand'] = []
df_data['Receptor'] = []
df_data['Ligand.Cluster'] = []
df_data['Receptor.Cluster'] = []
df_data['isReceptor_fst'] = []
df_data['isReceptor_scn'] = []
df_data['MeanLR'] = []
for i in range(data.shape[0]):
pair = list(data['interacting_pair'])[i].split('_')
for j in range(data.iloc[:,12:].shape[1]):
c_pair = list(data.columns)[j+12].split('|')
if float(data.iloc[i,j+12]) !=0.0:
df_data['Ligand'].append(pair[0])
df_data['Receptor'].append(pair[1])
df_data['Ligand.Cluster'].append(clsmapping[c_pair[0]])
df_data['Receptor.Cluster'].append(clsmapping[c_pair[1]])
df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
df_data['MeanLR'].append(data.iloc[i,j+12])
data_final = pd.DataFrame.from_dict(df_data)
return(data_final)
def cpdb2df_nocls(data):
'''
When the cluster name is used on CPDB
'''
data = data.fillna(0)
df_data = {}
df_data['Ligand'] = []
df_data['Receptor'] = []
df_data['Ligand.Cluster'] = []
df_data['Receptor.Cluster'] = []
df_data['isReceptor_fst'] = []
df_data['isReceptor_scn'] = []
df_data['MeanLR'] = []
for i in range(data.shape[0]):
pair = list(data['interacting_pair'])[i].split('_')
for j in range(data.iloc[:,12:].shape[1]):
c_pair = list(data.columns)[j+12].split('|')
if float(data.iloc[i,j+12]) !=0.0:
df_data['Ligand'].append(pair[0])
df_data['Receptor'].append(pair[1])
df_data['Ligand.Cluster'].append(c_pair[0])
df_data['Receptor.Cluster'].append(c_pair[1])
df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
df_data['MeanLR'].append(data.iloc[i,j+12])
data_final = pd.DataFrame.from_dict(df_data)
return(data_final)
s1 = pd.read_csv('./s1/significant_means.txt',sep='\t')
s2 = pd.read_csv('./s2/ significant_means.txt',sep='\t')
#dict with the mapping
num_to_clust = {'1':'Cluters 1',
'2':'Cluters 2',
'3':'Cluters 3',
'4':'Cluters 4',
'5':'Cluters 5'}
s1_filtered = cpdb2df(s1,num_to_clust)
s2_filtered = cpdb2df(s2,num_to_clust)
s1_filtered = correct_lr(s1_filtered)
s2_filtered = correct_lr(s2_filtered)
s1_filtered.to_csv('s1_filtered_corrected.csv')
s2_filtered.to_csv('s2_filtered_corrected.csv')
CrossTalkeR
library('CrossTalkeR')
# the method always consider the first path as control: the multiple control case will be handle soon
paths <- c('CTR' = 's1_filtered_corrected.csv',
'EXP' = 's1_filtered_corrected.csv',
'EXP1' = 's1_filtered_corrected.csv',)
# Selected gene list
genes <- c('TGFB1', 'PF4')
# Generating the report and the object
data <- generate_report(paths=paths, # paths list
selected_genes=genes, # Selected list
output='/home/nagai/Documents/', # output path
threshold = 0, # threshold of prune edges 0=keep all
out_file='All_Nils.html' report name
)
至于报告:



生活很好,有你更好
网友评论