女神节,女神们都放假了,男神们还在上班,喝一杯奶茶,继续学习吧。这一篇分享一个推断细胞网络的方法,ccNET,文章在Single-cell RNA sequencing data analysis based on non-uniform ε−neighborhood network,ccNET是一个高效的 scRNA-seq 分析框架,该框架通过非均匀 ε-邻域 (NEN) 网络一致地完成三个目标。 首先,NEN 方法生成了一个网络,它结合了 k 近邻 (KNN) 和 ε-邻域 (EN) 的优点来表示数据点驻留在基因空间中的流形。 然后从这样一个网络中,利用它的布局、它的社区以及它的最短路径来达到scRNA-seq数据可视化、聚类和轨迹推断的目的。
图片.pngccNET特征
- Different from the traditional analysis of scRNA-seq data, which performs visualization, clustering and trajectory inference using methods based on different theories, ccnet accomplishes the three targets in a consistent manner.
- NEN network combines the advantages of both k-neighbors (KNN) and epsilon-neighborhood (EN) to represent the intrinsic manifold of data.
示例代码
Installation
pip install ccnet
加载
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ccnet as ccn
# import ccn
示例数据
datafile = './Guo/data.csv'
timefile = './Guo/time_stages.csv'
准备相关数据,其中“数据”是归一化的表达矩阵,形状为438个细胞*38个基因,其中行和列对应细胞和基因。 'cell_names' 和 'gene_names' 分别是细胞名称列表和基因名称列表。 对于其他高维表达矩阵,建议在处理前进行PCA降维,降低噪声,加快计算速度。
data_df = pd.read_csv(datafile, index_col=0)
data = np.array(data_df)
cell_names = data_df.index.values
gene_names = data_df.columns.values
The 'labels' here indicates the time stage of each cell.
time_df = pd.read_csv(timefile, index_col=None, header=None)
labels = time_df[0].to_list()
Analysis
Topological data analysis
Persistent Homology (PH) is the flagship tool of Topological Data Analysis (TDA). Before generating the network, the PH is calculated here to select the appropriate parameter T in the Non-uniform -Neighborhood (NEN) method. Different from the traditional PH, where the construction of the network (actually complex) is based on -Neighborhood (NEN), and here is based on NEN, which is more self-adaptive than EN. Note that the TDA here is not necessary because the parameter T in the NEN method already has a rough value range.
# compute Persistent Homology
intervals = ccn.homology(data, k=4, metric='euclidean')
# display the result as persistence barcodes
fig, ax = plt.subplots(1, 2, figsize=(9,4))
ccn.plot(ax[0], intervals, xlabel='expansion time T', dim=0)
ccn.plot(ax[1], intervals, xlabel='expansion time T', dim=1)
#plt.savefig('figures_ph_guo/persistance_barcode.pdf', format='pdf')
图片.png
# display the first 10 barcodes of the left above
list(reversed(intervals[0][-10:]))
[array([ 0., inf]),
array([0. , 0.8115648]),
array([0. , 0.58644992]),
array([0. , 0.58060718]),
array([0. , 0.56599903]),
array([0. , 0.5608719]),
array([0. , 0.56084186]),
array([0. , 0.5542202]),
array([0. , 0.55128253]),
array([0. , 0.5510332])]
From the above results, we can know that when the parameter ∈ [0.8115648,+∞), the generated NEN network is connected, when ∈ [0.58644992,0.8115648) , the generated NEN network has two connected branches, and so on
Alternatively, the result of PH can also be displayed as a persistence diagram
# display the result as persistence diagram
fig, ax = plt.subplots()
ccn.plot(ax, intervals, gtype ='diagram', dim=0)
图片.png
Generating NEN network
Here we use the NEN method to generate the network. The parameter is the number of reference neighbors, which we set to 4. is the expansion time. take value roughly from 0.6 to 0.9, and the corresponding network changes from sparse to dense. Beyond this range, the generated network usually cannot approximate the intrinsic manifold of data appropriately. In order not to make the single-cell network too dense, we make T = 0.61. At the same time, in order to keep the generated network connected and facilitate downstream analysis, we combine the NEN network with a 2-neighbor network, and set the parameter , which repeatedly connects the nearest node between the two components until the whole network is connected.
# generate NEN network
adm = ccn.network(data, k=4, T=0.61, metric='euclidean', addknn=2, connected=True)
Network analysis
Statistical analysis of the network will help us to have a rough understanding of the structure of data. Some commonly used network statistics include node number, edge number, minimum degree, maximum degree, average degree, network radius, clustering coefficient, etc.
# compute some network statistics
ccn.statistics(adm)
# display degree distribution
fig, ax = plt.subplots()
out = ccn.statistics(adm,ax, s=20)
图片.png
Visualization
Here we use the network layout method ForceAtlas2 (FA2,力导向矩阵) to spatialize the network generated above. To prevent the overlapping between nodes, we call FA2 twice. When FA2 is called for the first time, we treated each node as a particle of zero size, namely, a particle without volume, thus producing a reasonable layout. On the basis of this layout, we call FA2 for the second time.
# network layout
pos = ccn.layout(adm, method='fa2', size=15, seed=4,
niter1=2000, niter2=200,
outbound1=False, outbound2=False,
linlog1=False, linlog2=False)
If we know the index of the root cell or early developing cell, we can rotate the position so that the root cell is at the top of the layout.
# rotate the position
pos = ccn.rotate_pos(pos, root_ind=0)
# visualize the network or data
fig, ax = plt.subplots(1,2, figsize=(9,4))
outax = ccn.plot_net(ax[0], pos, node_color=labels, node_size=2, edges=True, adm=adm)
outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=False, adm=adm)
fig.colorbar(outax, ax=ax)
图片.png
Clustering
We first weight the network, and then clustering through community detection. We describe two weighting strategies: by Sigmoid funcion and Jaccard similarity.
Weighting by Sigmoid function
wadms = ccn.sigmoid(adm, data, c=0.5, h=1.5, s=30)
labelss = ccn.leiden_cluster(wadms, resolution=1)
fig, ax = plt.subplots(1,2, figsize=(9,4))
outax = ccn.plot_net(ax[0], pos, node_color=labelss, cmap=plt.cm.tab20, node_size=2, edges=False, adm=adm)
ax[0].set_title('colored by labelss')
outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=False, adm=adm)
ax[1].set_title('colored by time stage')
fig.colorbar(outax, ax=ax)
图片.png
Weighting by Jaccard similarity
wadmj = ccn.jaccard_net(adm)
labelsj = ccn.leiden_cluster(wadmj, resolution=1)
fig, ax = plt.subplots(1,2, figsize=(9,4))
outax = ccn.plot_net(ax[0], pos, node_color=labelsj, cmap=plt.cm.tab20, node_size=2, edges=False, adm=adm)
ax[0].set_title('colored by labelsj')
outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=False, adm=adm)
ax[1].set_title('colored by time stage')
fig.colorbar(outax, ax=ax)
图片.png
Pseudotime ordering
We first weight the edges by the expasion time, then define the pseudotime of each cell as its shortest path to the root cell.
# weight the network
wadm = ccn.network(data, k=4, T=0.61, addknn=2, connected=True, weighted=True)
# calculate the pseudotime
ordering = ccn.pseudotime(wadm, source=0)
fig, ax = plt.subplots(1,2, figsize=(9,4))
outax = ccn.plot_net(ax[0], pos, node_color=ordering, node_size=2, edges=True, adm=adm)
ax[0].set_title('colored by pseudotime')
outax = ccn.plot_net(ax[1], pos, node_color=labels, node_size=2, edges=True, adm=adm)
ax[1].set_title('colored by time stage')
fig.colorbar(outax, ax=ax)
图片.png
# Calculate the correlation coefficient between pseudotime and time stage
cs = [0,0,0]
cs[0] = ccn.pearson(ordering, labels)
cs[1] = ccn.spearman(ordering, labels)
cs[2] = ccn.kendall(ordering, labels)
print(' - pearson cor = ', cs[0])
print(' - spearman cor = ', cs[1])
print(' - kendall cor = ', cs[2])
Find trajectory-associated genes
Given the source and target cells, we can calculate the shortest path and consider it as the development trajectory from the former to the latter. Then we can find the genes related to this trajectory.
# given the source and target cells
source = 0
target = 414
# display it
fig, ax = plt.subplots()
outax = ccn.plot_net(ax, pos, node_size=2, edges=False, adm=adm)
ax.scatter(pos[source,0], pos[source,1], c='red', s=60, marker="*")
ax.text(pos[source,0]+100, pos[source,1], 'source cell')
ax.scatter(pos[target,0], pos[target,1], c='red', s=60, marker="*")
ax.text(pos[target,0]+600, pos[target,1], 'target cell')
图片.png
# calculate shortest path and trajectory-associated genes
path, genes = ccn.tags(wadm, data, ordering, gene_names, source=source, target=target)
# display the trajectory
fig, ax = plt.subplots()
outax = ccn.plot_net(ax, pos, node_size=2, edges=False, adm=adm)
# path
ax.plot(pos[np.ix_(path, [0])], pos[np.ix_(path, [1])], c='orange')
ax.scatter(pos[np.ix_(path, [0])], pos[np.ix_(path, [1])], c='k', s=40, marker='^')
ax.scatter(pos[source,0], pos[source,1], c='red', s=60, marker="*")
ax.scatter(pos[target,0], pos[target,1], c='red', s=60, marker="*")
图片.png
# colored by tarjectory-associated genes
fig, ax = plt.subplots(2, 2)
gs = ['Pecam1', 'Snai1', 'Gata4', 'Sox2']
for r in range(2):
for c in range(2):
num = r*2 + c
outax = ccn.plot_net(ax[r,c], pos, node_color=data_df[gs[num]].tolist(), node_size=2, edges=False, adm=adm)
ax[r,c].set_title(gs[num])
# delete xticks and yticks
ax[r,c].set_xticks(ticks=[])
ax[r,c].set_yticks(ticks=[])
fig.colorbar(outax, ax=ax[r,c])
图片.png
# colored by tarjectory-associated genes
fig, ax = plt.subplots(2, 2)
gs = ['Gata3', 'Sox17', 'Dab2', 'Eomes']
for r in range(2):
for c in range(2):
num = r*2 + c
outax = ccn.plot_net(ax[r,c], pos, node_color=data_df[gs[num]].tolist(), node_size=2, edges=False, adm=adm)
ax[r,c].set_title(gs[num])
# delete xticks and yticks
ax[r,c].set_xticks(ticks=[])
ax[r,c].set_yticks(ticks=[])
fig.colorbar(outax, ax=ax[r,c])
图片.png
示例代码在ccNET
生活很好,有你更好
网友评论