10X单细胞（10X空间转录组）细胞网络推断分析之ccNET

作者: 单细胞空间交响乐 | 来源:发表于2022-03-08 14:58 被阅读0次

10X单细胞（10X空间转录组）轨迹分析（拟时分析）VECTOR
10X单细胞（10X空间转录组）基础算法之KL散度
10X单细胞（10X空间转录组）细胞网络推断分析之ccNET
10X单细胞和空间联合分析的方法---cell2location
单细胞SRR数据拆分
10X单细胞和10X空间转录组联合分析文章分享
10X单细胞（10X空间转录组）聚类分析之scDCC
10X单细胞（10X空间转录组）空间相关性分析和cellphon
10X单细胞WGCNA分析联合10X空间转录组WGCNA分析
10X单细胞核转录组 + 10X单细胞ATAC的联合分析（WNN

女神节，女神们都放假了，男神们还在上班，喝一杯奶茶，继续学习吧。这一篇分享一个推断细胞网络的方法，ccNET，文章在Single-cell RNA sequencing data analysis based on non-uniform ε−neighborhood network，ccNET是一个高效的 scRNA-seq 分析框架，该框架通过非均匀 ε-邻域 (NEN) 网络一致地完成三个目标。首先，NEN 方法生成了一个网络，它结合了 k 近邻 (KNN) 和 ε-邻域 (EN) 的优点来表示数据点驻留在基因空间中的流形。然后从这样一个网络中，利用它的布局、它的社区以及它的最短路径来达到scRNA-seq数据可视化、聚类和轨迹推断的目的。

图片.png

ccNET特征

Different from the traditional analysis of scRNA-seq data, which performs visualization, clustering and trajectory inference using methods based on different theories, ccnet accomplishes the three targets in a consistent manner.
NEN network combines the advantages of both k-neighbors (KNN) and epsilon-neighborhood (EN) to represent the intrinsic manifold of data.

图片.png

示例代码

Installation

pip install ccnet

加载

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import ccnet as ccn
# import ccn

示例数据

datafile = './Guo/data.csv'
timefile = './Guo/time_stages.csv'

准备相关数据，其中“数据”是归一化的表达矩阵，形状为438个细胞*38个基因，其中行和列对应细胞和基因。 'cell_names' 和 'gene_names' 分别是细胞名称列表和基因名称列表。对于其他高维表达矩阵，建议在处理前进行PCA降维，降低噪声，加快计算速度。

data_df = pd.read_csv(datafile, index_col=0)
data = np.array(data_df)
cell_names = data_df.index.values
gene_names = data_df.columns.values

The 'labels' here indicates the time stage of each cell.

time_df = pd.read_csv(timefile, index_col=None, header=None)
labels = time_df[0].to_list()

Analysis

Topological data analysis

Persistent Homology (PH) is the flagship tool of Topological Data Analysis (TDA). Before generating the network, the PH is calculated here to select the appropriate parameter T in the Non-uniform -Neighborhood (NEN) method. Different from the traditional PH, where the construction of the network (actually complex) is based on -Neighborhood (NEN), and here is based on NEN, which is more self-adaptive than EN. Note that the TDA here is not necessary because the parameter T in the NEN method already has a rough value range.

# compute Persistent Homology
intervals = ccn.homology(data, k=4, metric='euclidean')
# display the result as persistence barcodes
fig, ax = plt.subplots(1, 2, figsize=(9,4))

ccn.plot(ax[0], intervals, xlabel='expansion time T', dim=0)
ccn.plot(ax[1], intervals, xlabel='expansion time T', dim=1)

#plt.savefig('figures_ph_guo/persistance_barcode.pdf', format='pdf')

图片.png

# display the first 10 barcodes of the left above
list(reversed(intervals[0][-10:]))

[array([ 0., inf]),
array([0. , 0.8115648]),
array([0. , 0.58644992]),
array([0. , 0.58060718]),
array([0. , 0.56599903]),
array([0. , 0.5608719]),
array([0. , 0.56084186]),
array([0. , 0.5542202]),
array([0. , 0.55128253]),
array([0. , 0.5510332])]

From the above results, we can know that when the parameter $t$ ∈ [0.8115648,+∞), the generated NEN network is connected, when $T$ ∈ [0.58644992,0.8115648) , the generated NEN network has two connected branches, and so on

Alternatively, the result of PH can also be displayed as a persistence diagram

# display the result as persistence diagram
fig, ax = plt.subplots()
ccn.plot(ax, intervals, gtype ='diagram', dim=0)

图片.png

Generating NEN network

Here we use the NEN method to generate the network. The parameter $k$ is the number of reference neighbors, which we set to 4. $T$ is the expansion time. $T$ take value roughly from 0.6 to 0.9, and the corresponding network changes from sparse to dense. Beyond this range, the generated network usually cannot approximate the intrinsic manifold of data appropriately. In order not to make the single-cell network too dense, we make T = 0.61. At the same time, in order to keep the generated network connected and facilitate downstream analysis, we combine the NEN network with a 2-neighbor network, and set the parameter $connected = TRUE$ , which repeatedly connects the nearest node between the two components until the whole network is connected.

# generate NEN network
adm = ccn.network(data, k=4, T=0.61, metric='euclidean', addknn=2, connected=True)

Network analysis

Statistical analysis of the network will help us to have a rough understanding of the structure of data. Some commonly used network statistics include node number, edge number, minimum degree, maximum degree, average degree, network radius, clustering coefficient, etc.

# compute some network statistics
ccn.statistics(adm)
# display degree distribution
fig, ax = plt.subplots()

out = ccn.statistics(adm,ax, s=20)

图片.png

Visualization

Here we use the network layout method ForceAtlas2 (FA2,力导向矩阵) to spatialize the network generated above. To prevent the overlapping between nodes, we call FA2 twice. When FA2 is called for the first time, we treated each node as a particle of zero size, namely, a particle without volume, thus producing a reasonable layout. On the basis of this layout, we call FA2 for the second time.

# network layout
pos = ccn.layout(adm, method='fa2', size=15, seed=4, 
                 niter1=2000, niter2=200, 
                 outbound1=False, outbound2=False, 
                 linlog1=False, linlog2=False)

If we know the index of the root cell or early developing cell, we can rotate the position so that the root cell is at the top of the layout.

# rotate the position
pos = ccn.rotate_pos(pos, root_ind=0)
# visualize the network or data
fig, ax = plt.subplots(1,2, figsize=(9,4))

outax = ccn.plot_net(ax[0], pos, node_color=labels, node_size=2, edges=True, adm=adm)
outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=False, adm=adm)
fig.colorbar(outax, ax=ax)

图片.png

Clustering

We first weight the network, and then clustering through community detection. We describe two weighting strategies: by Sigmoid funcion and Jaccard similarity.

Weighting by Sigmoid function

wadms = ccn.sigmoid(adm, data, c=0.5, h=1.5, s=30)
labelss = ccn.leiden_cluster(wadms, resolution=1)


fig, ax = plt.subplots(1,2, figsize=(9,4))
outax = ccn.plot_net(ax[0], pos, node_color=labelss, cmap=plt.cm.tab20, node_size=2, edges=False, adm=adm)
ax[0].set_title('colored by labelss')

outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=False, adm=adm)
ax[1].set_title('colored by time stage')
fig.colorbar(outax, ax=ax)

图片.png

Weighting by Jaccard similarity

wadmj = ccn.jaccard_net(adm)
labelsj = ccn.leiden_cluster(wadmj, resolution=1)
fig, ax = plt.subplots(1,2, figsize=(9,4))
outax = ccn.plot_net(ax[0], pos, node_color=labelsj, cmap=plt.cm.tab20, node_size=2, edges=False, adm=adm)
ax[0].set_title('colored by labelsj')

outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=False, adm=adm)
ax[1].set_title('colored by time stage')
fig.colorbar(outax, ax=ax)

图片.png

Pseudotime ordering

We first weight the edges by the expasion time, then define the pseudotime of each cell as its shortest path to the root cell.

# weight the network
wadm = ccn.network(data, k=4, T=0.61, addknn=2, connected=True, weighted=True)
# calculate the pseudotime
ordering = ccn.pseudotime(wadm, source=0)


fig, ax = plt.subplots(1,2, figsize=(9,4))

outax = ccn.plot_net(ax[0], pos, node_color=ordering, node_size=2, edges=True, adm=adm)
ax[0].set_title('colored by pseudotime')

outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=True, adm=adm)
ax[1].set_title('colored by time stage')
fig.colorbar(outax, ax=ax)

图片.png

# Calculate the correlation coefficient between pseudotime and time stage
cs = [0,0,0]
cs[0] = ccn.pearson(ordering, labels)
cs[1] = ccn.spearman(ordering, labels)
cs[2] = ccn.kendall(ordering, labels)

print(' - pearson cor  = ', cs[0])
print(' - spearman cor = ', cs[1])
print(' - kendall cor  = ', cs[2])

Find trajectory-associated genes

Given the source and target cells, we can calculate the shortest path and consider it as the development trajectory from the former to the latter. Then we can find the genes related to this trajectory.

# given the source and target cells
source = 0
target = 414


# display it
fig, ax = plt.subplots()
outax = ccn.plot_net(ax, pos, node_size=2, edges=False, adm=adm)

ax.scatter(pos[source,0], pos[source,1], c='red', s=60, marker="*")
ax.text(pos[source,0]+100, pos[source,1], 'source cell')


ax.scatter(pos[target,0], pos[target,1], c='red', s=60, marker="*")
ax.text(pos[target,0]+600, pos[target,1], 'target cell')

图片.png

# calculate shortest path and trajectory-associated genes
path, genes = ccn.tags(wadm, data, ordering, gene_names, source=source, target=target)
# display the trajectory
fig, ax = plt.subplots()

outax = ccn.plot_net(ax, pos, node_size=2, edges=False, adm=adm)

# path
ax.plot(pos[np.ix_(path, [0])], pos[np.ix_(path, [1])], c='orange')
ax.scatter(pos[np.ix_(path, [0])], pos[np.ix_(path, [1])], c='k', s=40, marker='^')

ax.scatter(pos[source,0], pos[source,1], c='red', s=60, marker="*")
ax.scatter(pos[target,0], pos[target,1], c='red', s=60, marker="*")

图片.png

# colored by tarjectory-associated genes
fig, ax = plt.subplots(2, 2)

gs = ['Pecam1', 'Snai1', 'Gata4', 'Sox2']

for r in range(2):
    for c in range(2):
        num = r*2 + c
        outax = ccn.plot_net(ax[r,c], pos, node_color=data_df[gs[num]].tolist(), node_size=2, edges=False, adm=adm)
        ax[r,c].set_title(gs[num])
        
        # delete xticks and yticks
        ax[r,c].set_xticks(ticks=[])
        ax[r,c].set_yticks(ticks=[])
        
        fig.colorbar(outax, ax=ax[r,c])

图片.png

# colored by tarjectory-associated genes
fig, ax = plt.subplots(2, 2)

gs = ['Gata3', 'Sox17', 'Dab2', 'Eomes']

for r in range(2):
    for c in range(2):
        num = r*2 + c
        outax = ccn.plot_net(ax[r,c], pos, node_color=data_df[gs[num]].tolist(), node_size=2, edges=False, adm=adm)
        ax[r,c].set_title(gs[num])
        
        # delete xticks and yticks
        ax[r,c].set_xticks(ticks=[])
        ax[r,c].set_yticks(ticks=[])
        
        fig.colorbar(outax, ax=ax[r,c])

图片.png

示例代码在ccNET

生活很好，有你更好

10X单细胞（10X空间转录组）轨迹分析（拟时分析）VECTOR
hello，昨天我们分享了VECTOR的示例代码，文章在10X单细胞（10X空间转录组）轨迹分析（拟时分析）之VE...
10X单细胞（10X空间转录组）基础算法之KL散度
hello，大家好，昨天呢，我们认真分享了有关tSNE的相关知识，文章在10X单细胞（10X空间转录组）降维分析之...
10X单细胞（10X空间转录组）细胞网络推断分析之ccNET
女神节，女神们都放假了，男神们还在上班，喝一杯奶茶，继续学习吧。这一篇分享一个推断细胞网络的方法，ccNET，文章...
10X单细胞和空间联合分析的方法---cell2location
10X空间转录组和单细胞转录组都在如火如荼的进行当中，单细胞提供了单个细胞的精度来研究组织，而空间转录组则是提供了...
单细胞SRR数据拆分
10X单细胞转录组测序数据的 SRA转fastq fastq-dump SRA转fastq 参考链接：10X单细胞...
10X单细胞和10X空间转录组联合分析文章分享
各位同学，大家好，今天我们来分享一篇新的10X单细胞和10X空间转录组联合分析文章，文章在这里Spatiotemp...
10X单细胞（10X空间转录组）聚类分析之scDCC
hi，各位好，今天我们努努力，看一下10X单细胞和10X空间转录组普遍存在的dropout现象对我们数据分析的影响...
10X单细胞（10X空间转录组）空间相关性分析和cellphon
hello，大家好，随着10X单细胞、10X空间转录组如火如荼的进行中，我们的分析内容和手段也要进入深水区了，很多...
10X单细胞WGCNA分析联合10X空间转录组WGCNA分析
各位同学，大家好，今天我们继续WGCNA的分析点，这一次，我们需要联合两种技术，10X单细胞和10X空间转录组，也...
10X单细胞核转录组 + 10X单细胞ATAC的联合分析（WNN
hello，大家好，最近呢，10X单细胞核转录组 + 10X单细胞ATAC的多组学测序技术日渐成熟，因为这样技术同...