10X单细胞通讯分析之CrosstalkR（特异性和通讯强度的变

作者: 单细胞空间交响乐 | 来源:发表于2021-04-04 11:02 被阅读0次

10X单细胞通讯分析之CrosstalkR（特异性和通讯强度的变
10X单细胞（10X空间转录组）通讯分析之总结Nichenet生
10X单细胞通讯分析之ICELLNET
10X单细胞之细胞通讯篇章-----Connectome
单细胞数据细胞通讯分析软件NATMI
10X单细胞基础+个性化合集2
10X单细胞（10X空间转录组）轨迹分析（拟时分析）VECTOR
单细胞分析之细胞通讯：cellchat
10X单细胞（10X空间转录组）通讯分析之CellChat
10X单细胞（10X空间转录组）通讯分析之NicheNet

hello，大家好，生信一线员工这次给大家再分享一个细胞通讯的软件CrosstalkR，一样的道理，我们先来看文献，最后分享一下代码，文章在CrossTalkeR: Analysis and Visualisation of Ligand Receptor Networks,在这里再次重新强调一下，做细胞通讯，不能只关注对比中特异的配受体对，更好看看共有的配受体对通讯强度的变化，之前分享过一个方法也分析了通讯强度的变化，文章在单细胞数据细胞通讯分析软件NATMI。希望引起大家足够的重视，好了，我们开始看看文献。

Abstract

Motivation: Ligand-receptor (LR) analysis allows the characterization of cellular crosstalk from single cell RNA-seq data. However, current LR methods provide limited approaches for prioritization of cell types, ligands or receptors or characterizing changes in crosstalk between two biological conditions.（这里如前所说，配受体特异性和强度变化同等重要）。
Results:CrossTalkeR is a framework for network analysis and visualisation of LR networks. CrossTalkeR identifies relevant ligands, receptors and cell types contributing to changes in cell communication when contrasting two biological states: disease vs. homeostasis. A case study on scRNA-seq of human myeloproliferative neoplasms reinforces the strengths of CrossTalkeR for characterisation of changes in cellular crosstalk in disease state.(这个地方也说的我们通常的分析思路，大家对于这个应该不陌生)。

Introduction

Understanding cellular crosstalk is vital for uncovering molecular mechanisms associated with cell differentiation and disease progression.（ $\color{red}{这是共识}$ ）。这一部分主要讲了一下几点：
（1）information of cellular proximity and crosstalk is not captured（单细胞技术的缺点）
（2）current LR inference methods usually predicts, hundreds of potential LR pairs for a given scRNA-seq dataset making their interpretation challenging （目前方法上的缺陷，大部分软件是这样的预测配受体）。
（3） most 20 LR methods focus on the analysis of a single scRNA-seq experiment and are not able to characterize changes in cellular crosstalk between pairs of conditions,（这里就体现通讯强度在不同条件下的变化，而不是简简单单知道哪些特异性的配受体）。
（4）CrossTalkeR explores network-based measures to rank cell types, receptors,ligands, cell-receptors and cell-ligand pairs regarding their importance in cellular crosstalk in a single or a pair of biological conditions.（怎么样，分析思路绝对是perfect），并且和cellphoneDB的库可兼容，看来cellphoneDB一哥的地位无可撼动。

Overview and implementation

图片.png
我们来看看这个软件的几个特点。
1、疾病样本和正常样本是分开分析的。（数据是否整合我们需要进一步看看）
2、配受体的强度仍然采用平均值相乘的方法
3、对于不同state下的配受体，对比配受体强度的变化（上调和下调），及调控网络
4、配受体优先次序的排序，这个很创新
我们详细看看每步的过程。

Network Construction

CrossTalkeR constructs two representations of the LR network.
（1）On the cell-cell interaction (CCI) network, the nodes are defined by each cell-type and the edges are weighted by characteristics of the interactions 。 number of LR pairs and sum of weights of LR pairs。（数量和强度）
（2）On the cell-gene interaction (CGI) network, the nodes represent gene and cell pairs (ligand and sender cell or receptor and receiving cell) and the weights are amsmath igiven by the mean LR expression.（细胞基因的相互网络）。
（3）differential CCI and CGI networks are obtained by subtracting the condition state network, e.g.disease, from the control state network. In this way, the interactions with positive values are up-regulated in the condition and the negative valued interactions are down-regulated on the experiment (disease) state.（对比分析）。

Network-based analysis

（1）Node Ranking approaches:（这部分我们详细翻译一下）：
We apply distinct network property methods to rank either the CCI and CGI network. By default, measures take the weights of LR pairs into account.（使用不同的网络属性方法对CCI和CGI网络进行排名。默认情况下，度量会考虑LR对的权重。）

图片.png

不知道大家能不能详细看出来和之前分析方法的不同。

方法：

图片.png

这部分是解释说明，相对简单。

图片.png

这个地方说的是基因细胞网络的构建。

图片.png

这个地方解释CCI网络，和我们通常的网络差别不大。

图片.png

研究差别倒是很省事😄

图片.png

这里强调了，数据是在整合的基础上构建网络的，整合方法推荐的是Seurat的（恕我直言，der）。

图片.png

和上面一样，研究等级。

最后一部分很值得我们注意：：

PC analysis of network properties

CrossTalkerR employs PCA as a mean to combine distinct network measures. By default, it provides scores of two first PCs as a mean to rank either gene-cell pairs in GCI networks or cells in CCI networks.（排序的方式居然是PCA），CrossTalkeR also provide importance of variables within the PCs to make then interpretable. We interpret that the first PC capture global interactions, whilethe second PC local interactions. Top up-regulated cell-gene pairs predicted by PC2 include matrix (FN1/MSC, COL1A1/MSC) and TGFB signalling associated genes (TGFB1/Megakaryocyte), as previously discussed.。

图片.png

我们来看看参考代码

1 - Step Extract data from Seurat Object

require(EWCE)
require(tibble)
require(biomaRt)
require(tidyr)
require(dplyr)
# Basic function to convert mouse to human gene names
data_ <- readRDS('path_to_data')
alldata <- NormalizeData(alldata, normalization.method = "LogNormalize", scale.factor = 10000)
allgenes <- rownames(alldata)
matrix1 <- as.data.frame(alldata@assays$RNA@data)
matrix1 <- matrix1[rowSums(matrix1[,2:dim(matrix1)[2]])!=0,]

### If you are using a mouse data, then its needed to convert the gene names to human orthologs
human = useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useEnsembl("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = rownames(alldata@assays$RNA@data) , mart = mouse, attributesL = c("hgnc_symbol","hgnc_id",'ensembl_gene_id'), martL = human, uniqueRows=T)
print(head(genesV2))
matrix1 <- matrix1[match(genesV2$MGI.symbol,rownames(alldata),nomatch=F),]
matrix1$gene <- genesV2$Gene.stable.ID
#rownames(matrix1) <-  genesV2$Gene.stable.ID

### Subseting the matrix
s1 <- grepl('state1',alldata@meta.data$cond)
s2 <- grepl('state2',alldata@meta.data$cond)
s1[match('gene',colnames(matrix1))] <- TRUE
s2[match('gene',colnames(matrix1))] <- TRUE

## Checking the dimensions
print(dim(matrix1[,s1]))
print(dim(matrix1[,s2]))

## If the cluster names are categorical, you will need to convert it to numerical
alldata@meta.data$sub_cell_type <- as.factor(alldata@meta.data$sub_cell_type)
print(levels(alldata@meta.data$sub_cell_type))
levels(alldata@meta.data$sub_cell_type) <- 1:length(levels(alldata@meta.data$sub_cell_type))
print(1:length(levels(alldata@meta.data$sub_cell_type)))
alldata@meta.data$sub_cell_type <- as.numeric(alldata@meta.data$sub_cell_type)

write.table(matrix1[,s1], 's1_filtered_hcount.csv',row.names=T,sep=',')
write.table(matrix1[,s2], 's2_filtered_hcount.csv',row.names=T,sep=',')
metadata <- data.frame(cells=rownames(alldata@meta.data[grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[grepl('state1',alldata@meta.data$stim)])
metadata_s2 <- data.frame(cells=rownames(alldata@meta.data[!grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[!grepl('state1',alldata@meta.data$stim)]) ## Just negate grepl('state1',alldata@meta.data$stim),]
print('Writing Metadata')
write.csv(metadata, 's1_filtered_meta.csv', row.names=FALSE)
write.csv(metadata_tac, 's2_filtered_meta.csv', row.names=FALSE)```

Note that the gene ID needs to be unique, you will need to use an approach to combine multiple mapped orthologs (sum, max or bitwiseor )

Run CellPhoneDB

** The parameters list is available at https://github.com/Teichlab/cellphonedb

Ensembl based ID

#! /bin/bash
mkdir s1 s2  # creating the output folders

cellphonedb method statistical_analysis s1_filtered_meta.csv  s1_filtered_hcount.csv  --threads 30 --output-path s1/

cellphonedb method statistical_analysis s2_filtered_meta.csv  s2_filtered_hcount.csv  --threads 30 --output-path s2/

HUGO based ID

#! /bin/bash
mkdir s1 s2  # creating the output folders

cellphonedb method statistical_analysis s1_filtered_meta.csv --counts-data hgnc_symbol s1_filtered_hcount.csv  --threads 30 --output-path s1/

cellphonedb method statistical_analysis s2_filtered_meta.csv --counts-data hgnc_symbol s2_filtered_hcount.csv  --threads 30 --output-path s2/

Extracting LR

def correct_lr(data):
    '''
    Invert the RL to LR and R1R2 to r2>r1
    '''
    import pandas as pd
    def swap(a,b): return b,a
    data = data.to_dict('index')
    for k,v in data.items():
        if v['isReceptor_fst'] and v['isReceptor_scn']:
            v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
            v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
            v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
        elif v['isReceptor_fst'] and not v['isReceptor_scn']:
            v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
            v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
            v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
    res_df = pd.DataFrame.from_dict(data,orient='index')
    return (res_df)
def cpdb2df(data,clsmapping):
    data = data.fillna(0)
    df_data = {}
    df_data['Ligand'] = []
    df_data['Receptor'] = []
    df_data['Ligand.Cluster'] = []
    df_data['Receptor.Cluster'] = []
    df_data['isReceptor_fst'] = []
    df_data['isReceptor_scn'] = []
    df_data['MeanLR'] = []
    for i in range(data.shape[0]):
        pair = list(data['interacting_pair'])[i].split('_')
        for j in range(data.iloc[:,12:].shape[1]):
            c_pair = list(data.columns)[j+12].split('|')
            if float(data.iloc[i,j+12]) !=0.0:
                df_data['Ligand'].append(pair[0])
                df_data['Receptor'].append(pair[1])
                df_data['Ligand.Cluster'].append(clsmapping[c_pair[0]])
                df_data['Receptor.Cluster'].append(clsmapping[c_pair[1]])
                df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
                df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
                df_data['MeanLR'].append(data.iloc[i,j+12])
    data_final = pd.DataFrame.from_dict(df_data)
    return(data_final)

def cpdb2df_nocls(data):
    '''
        When the cluster name is used on CPDB
    '''
    data = data.fillna(0)
    df_data = {}
    df_data['Ligand'] = []
    df_data['Receptor'] = []
    df_data['Ligand.Cluster'] = []
    df_data['Receptor.Cluster'] = []
    df_data['isReceptor_fst'] = []
    df_data['isReceptor_scn'] = []
    df_data['MeanLR'] = []
    for i in range(data.shape[0]):
        pair = list(data['interacting_pair'])[i].split('_')
        for j in range(data.iloc[:,12:].shape[1]):
            c_pair = list(data.columns)[j+12].split('|')
            if float(data.iloc[i,j+12]) !=0.0:
                df_data['Ligand'].append(pair[0])
                df_data['Receptor'].append(pair[1])
                df_data['Ligand.Cluster'].append(c_pair[0])
                df_data['Receptor.Cluster'].append(c_pair[1])
                df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
                df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
                df_data['MeanLR'].append(data.iloc[i,j+12])
    data_final = pd.DataFrame.from_dict(df_data)
    return(data_final)

s1 = pd.read_csv('./s1/significant_means.txt',sep='\t')
s2 = pd.read_csv('./s2/ significant_means.txt',sep='\t')
#dict with the mapping
num_to_clust = {'1':'Cluters 1',
                '2':'Cluters 2',
                '3':'Cluters 3',
                '4':'Cluters 4',
                '5':'Cluters 5'}

s1_filtered = cpdb2df(s1,num_to_clust)
s2_filtered = cpdb2df(s2,num_to_clust)
s1_filtered = correct_lr(s1_filtered)
s2_filtered = correct_lr(s2_filtered)

s1_filtered.to_csv('s1_filtered_corrected.csv')
s2_filtered.to_csv('s2_filtered_corrected.csv')

CrossTalkeR

library('CrossTalkeR')

# the method always consider the first path as control: the multiple control case will be handle soon
paths <- c('CTR' = 's1_filtered_corrected.csv',
           'EXP' = 's1_filtered_corrected.csv',
           'EXP1' = 's1_filtered_corrected.csv',)
# Selected gene list     
genes <- c('TGFB1', 'PF4')

# Generating the report and the object

data <- generate_report(paths=paths, # paths list
                        selected_genes=genes, # Selected list
                        output='/home/nagai/Documents/', # output path
                        threshold = 0, # threshold of prune edges 0=keep all
                        out_file='All_Nils.html' report name
                        )

至于报告：

图片.png

生活很好，有你更好

10X单细胞通讯分析之CrosstalkR（特异性和通讯强度的变
hello，大家好，生信一线员工这次给大家再分享一个细胞通讯的软件CrosstalkR，一样的道理，我们先来看文献...
10X单细胞（10X空间转录组）通讯分析之总结Nichenet生
作者，追风少年i 系统总结nichenet多条件通讯差异分析 nichenet的原理大家可以参考文章10X单细胞（...
10X单细胞通讯分析之ICELLNET
今天我们来分享一下新的10X单细胞通讯分析的方法，名字是ICELLNET，文章在Dissection of int...
10X单细胞之细胞通讯篇章-----Connectome
今天我们来分享一篇关于细胞通讯的文章，其实关于细胞通讯我分享的就已经很多了，不知道大家对10X单细胞做细胞通讯有了...
单细胞数据细胞通讯分析软件NATMI
单细胞数据分析中，细胞通讯是最重要的一环当然，已经有了很多的软件用于单细胞数据的通讯分析，比如cellphoneD...
10X单细胞基础+个性化合集2
作者，Evil Genius 这一篇分享以下后三个部分单细胞个性化分析之细胞通讯篇单细胞个性化分析之CNV篇 ...
10X单细胞（10X空间转录组）轨迹分析（拟时分析）VECTOR
hello，昨天我们分享了VECTOR的示例代码，文章在10X单细胞（10X空间转录组）轨迹分析（拟时分析）之VE...
单细胞分析之细胞通讯：cellchat
这篇文章在2020年7月上传到了bioRxiv（Inference and analysis of cell-ce...
10X单细胞（10X空间转录组）通讯分析之CellChat
hello，大家好，今天我们来分享一下细胞通讯的文章CellChat，其实有关这个软件已经有了很多的文章来介绍了，...
10X单细胞（10X空间转录组）通讯分析之NicheNet
hello，大家好，今天我们来分享一个做细胞通讯的分析软件NicheNet，这个软件最大特点就是预测配体活性，所以...