美文网首页单细胞数据分析单细胞测序专题集合
10X空间转录组聚类分析回顾之SpaGCN(结合基因表达信息、空

10X空间转录组聚类分析回顾之SpaGCN(结合基因表达信息、空

作者: 单细胞空间交响乐 | 来源:发表于2022-01-18 15:55 被阅读0次

    隔离的第九天,孤独仍在,但亦师亦友,今天我们来回顾一下10X空间转录组聚类的分析软件SpaGCN,这个软件之前分享过,文章在10X空间转录组聚类分析之图卷积网络(graph convolutional network),但当时分享的时候文章还是预印版,现在文章正式发表,在SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network,发表于Nature Methods,IF28.547,我们来回顾一下算法原理和示例代码

    图片.png

    Overview of SpaGCN and evaluation.

    SpaGCN首先构建一个图来表示考虑空间位置和组织学信息的所有点的关系。 接下来,SpaGCN 利用图卷积层来聚合来自相邻点的基因表达信息。 然后,SpaGCN 使用聚合表达式矩阵使用无监督迭代聚类算法对点进行聚类。 每个clusters被视为一个空间域,SpaGCN 然后从中检测通过 DE 分析在域中丰富的 SVG。 当单个基因不能标记一个域的表达模式时,SpaGCN 会构建一个meta基因,由多个基因组合形成,来代表该域的表达模式

    图片.png

    示例代码

    Outline

      1. Installation
      1. Import modules
      1. Read in data
      1. Integrate gene expression and histology into a Graph
      1. Spatial domain detection using SpaGCN
      1. Identify SVGs
      1. Identify Meta Gene
      1. Multiple tissue sections analysis
    Installation
    pip3 install SpaGCN
    #Note: you need to make sure that the pip is for python3,or we should install SpaGCN by
    python3 -m pip install SpaGCN
    pip3 install SpaGCN
    #If you do not have permission (when you get a permission denied error), you should install SpaGCN by
    pip3 install --user SpaGCN
    
    Import python modules
    import os,csv,re
    import pandas as pd
    import numpy as np
    import scanpy as sc
    import math
    import SpaGCN as spg
    from scipy.sparse import issparse
    import random, torch
    import warnings
    warnings.filterwarnings("ignore")
    import matplotlib.colors as clr
    import matplotlib.pyplot as plt
    import SpaGCN as spg
    #In order to read in image data, we need to install some package. Here we recommend package "opencv"
    #inatll opencv in python
    #!pip3 install opencv-python
    import cv2
    
    Read in data

    The current version of SpaGCN requres three input data:

    • The gene expression matrix(n by k): expression_matrix.h5(我们就读这个就可以);
    • Spatial coordinateds of samplespositions.txt;
    • Histology image(optional): histology.tif, can be tif or png or jepg.

    The gene expreesion data can be stored as an AnnData object. AnnData stores a data matrix .X together with annotations of observations .obs, variables .var and unstructured annotations .uns.

    """
    #Read original 10x_h5 data and save it to h5ad
    from scanpy import read_10x_h5
    adata = read_10x_h5("../tutorial/data/151673/expression_matrix.h5")
    spatial=pd.read_csv("../tutorial/data/151673/positions.txt",sep=",",header=None,na_filter=False,index_col=0) 
    adata.obs["x1"]=spatial[1]
    adata.obs["x2"]=spatial[2]
    adata.obs["x3"]=spatial[3]
    adata.obs["x4"]=spatial[4]
    adata.obs["x5"]=spatial[5]
    adata.obs["x_array"]=adata.obs["x2"]
    adata.obs["y_array"]=adata.obs["x3"]
    adata.obs["x_pixel"]=adata.obs["x4"]
    adata.obs["y_pixel"]=adata.obs["x5"]
    #Select captured samples
    adata=adata[adata.obs["x1"]==1]
    adata.var_names=[i.upper() for i in list(adata.var_names)]
    adata.var["genename"]=adata.var.index.astype("str")
    adata.write_h5ad("../tutorial/data/151673/sample_data.h5ad")
    """
    #Read in gene expression and spatial location
    adata=sc.read("../tutorial/data/151673/sample_data.h5ad")
    #Read in hitology image
    img=cv2.imread("../tutorial/data/151673/histology.tif")
    
    Integrate gene expression and histology into a Graph
    #Set coordinates
    x_array=adata.obs["x_array"].tolist()
    y_array=adata.obs["y_array"].tolist()
    x_pixel=adata.obs["x_pixel"].tolist()
    y_pixel=adata.obs["y_pixel"].tolist()
    
    #Test coordinates on the image
    img_new=img.copy()
    for i in range(len(x_pixel)):
        x=x_pixel[i]
        y=y_pixel[i]
        img_new[int(x-20):int(x+20), int(y-20):int(y+20),:]=0
    
    cv2.imwrite('./sample_results/151673_map.jpg', img_new)
    
    • 's' 参数决定了在计算每两个点之间的欧几里得距离时赋予组织学的权重。 's = 1' 表示组织学像素强度值与 (x,y) 坐标具有相同的尺度方差,而 's' 值越大表示尺度方差越大,因此在计算欧几里得距离时对组织学的权重越高 .
    • “b”参数确定提取颜色强度时每个点的面积。
    #Calculate adjacent matrix
    s=1
    b=49
    adj=spg.calculate_adj_matrix(x=x_pixel,y=y_pixel, x_pixel=x_pixel, y_pixel=y_pixel, image=img, beta=b, alpha=s, histology=True)
    #If histlogy image is not available, SpaGCN can calculate the adjacent matrix using the fnction below
    #adj=calculate_adj_matrix(x=x_pixel,y=y_pixel, histology=False)
    np.savetxt('./data/151673/adj.csv', adj, delimiter=',')
    
    Spatial domain detection using SpaGCN
    adata=sc.read("./data/151673/sample_data.h5ad")
    adj=np.loadtxt('./data/151673/adj.csv', delimiter=',')
    adata.var_names_make_unique()
    spg.prefilter_genes(adata,min_cells=3) # avoiding all genes are zeros
    spg.prefilter_specialgenes(adata)
    #Normalize and take log for UMI
    sc.pp.normalize_per_cell(adata)
    sc.pp.log1p(adata)
    

    Set hyper-parameters

    • p: Percentage of total expression contributed by neighborhoods.
    • l: Parameter to control p.
    p=0.5 
    #Find the l value given p
    l=spg.search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
    
    • n_clusters: Number of spatial domains wanted.
    • res: Resolution in the initial Louvain's Clustering methods. If the number of clusters is known, we can use the spg.search_res() fnction to search for suitable resolution(optional)
    #If the number of clusters known, we can use the spg.search_res() fnction to search for suitable resolution(optional)
    #For this toy data, we set the number of clusters=7 since this tissue has 7 layers
    n_clusters=7
    #Set seed
    r_seed=t_seed=n_seed=100
    #Seaech for suitable resolution
    res=spg.search_res(adata, adj, l, n_clusters, start=0.7, step=0.1, tol=5e-3, lr=0.05, max_epochs=20, r_seed=r_seed, t_seed=t_seed, n_seed=n_seed)
    

    Run SpaGCN

    clf=spg.SpaGCN()
    clf.set_l(l)
    #Set seed
    random.seed(r_seed)
    torch.manual_seed(t_seed)
    np.random.seed(n_seed)
    #Run
    clf.train(adata,adj,init_spa=True,init="louvain",res=res, tol=5e-3, lr=0.05, max_epochs=200)
    y_pred, prob=clf.predict()
    adata.obs["pred"]= y_pred
    adata.obs["pred"]=adata.obs["pred"].astype('category')
    #Do cluster refinement(optional)
    #shape="hexagon" for Visium data, "square" for ST data.
    adj_2d=spg.calculate_adj_matrix(x=x_array,y=y_array, histology=False)
    refined_pred=spg.refine(sample_id=adata.obs.index.tolist(), pred=adata.obs["pred"].tolist(), dis=adj_2d, shape="hexagon")
    adata.obs["refined_pred"]=refined_pred
    adata.obs["refined_pred"]=adata.obs["refined_pred"].astype('category')
    #Save results
    adata.write_h5ad("./sample_results/results.h5ad")
    

    Plot spatial domains

    adata=sc.read("./sample_results/results.h5ad")
    #Set colors used
    plot_color=["#F56867","#FEB915","#C798EE","#59BE86","#7495D3","#D1D1D1","#6D1A9C","#15821E","#3A84E6","#997273","#787878","#DB4C6C","#9E7A7A","#554236","#AF5F3C","#93796C","#F9BD3F","#DAB370","#877F6C","#268785"]
    #Plot spatial domains
    domains="pred"
    num_celltype=len(adata.obs[domains].unique())
    adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
    ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
    ax.set_aspect('equal', 'box')
    ax.axes.invert_yaxis()
    plt.savefig("./sample_results/pred.png", dpi=600)
    plt.close()
    
    #Plot refined spatial domains
    domains="refined_pred"
    num_celltype=len(adata.obs[domains].unique())
    adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
    ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
    ax.set_aspect('equal', 'box')
    ax.axes.invert_yaxis()
    plt.savefig("./sample_results/refined_pred.png", dpi=600)
    plt.close()
    
    图片.png
    Identify SVGs
    #Read in raw data
    raw=sc.read("../tutorial/data/151673/sample_data.h5ad")
    raw.var_names_make_unique()
    raw.obs["pred"]=adata.obs["pred"].astype('category')
    raw.obs["x_array"]=raw.obs["x2"]
    raw.obs["y_array"]=raw.obs["x3"]
    raw.obs["x_pixel"]=raw.obs["x4"]
    raw.obs["y_pixel"]=raw.obs["x5"]
    #Convert sparse matrix to non-sparse
    raw.X=(raw.X.A if issparse(raw.X) else raw.X)
    raw.raw=raw
    sc.pp.log1p(raw)
    
    • target: Target domain to identify SVGs.
    • min_in_group_fraction: Minium in-group expression fraction.
    • min_in_out_group_ratio: Miniumn (in-group expression fraction) / (out-group expression fraction).
    • min_fold_change: Miniumn (in-group expression) / (out-group expression).
    • r: Radius to detect a spot's neighboring spots.
    #Use domain 0 as an example
    target=0
    #Set filtering criterials
    min_in_group_fraction=0.8
    min_in_out_group_ratio=1
    min_fold_change=1.5
    #Search radius such that each spot in the target domain has approximately 10 neighbors on average
    adj_2d=spg.calculate_adj_matrix(x=x_array, y=y_array, histology=False)
    start, end= np.quantile(adj_2d[adj_2d!=0],q=0.001), np.quantile(adj_2d[adj_2d!=0],q=0.1)
    r=spg.search_radius(target_cluster=target, cell_id=adata.obs.index.tolist(), x=x_array, y=y_array, pred=adata.obs["pred"].tolist(), start=start, end=end, num_min=10, num_max=14,  max_run=100)
    #Detect neighboring domains
    nbr_domians=spg.find_neighbor_clusters(target_cluster=target,
                                       cell_id=raw.obs.index.tolist(), 
                                       x=raw.obs["x_array"].tolist(), 
                                       y=raw.obs["y_array"].tolist(), 
                                       pred=raw.obs["pred"].tolist(),
                                       radius=r,
                                       ratio=1/2)
    
    nbr_domians=nbr_domians[0:3]
    de_genes_info=spg.rank_genes_groups(input_adata=raw,
                                    target_cluster=target,
                                    nbr_list=nbr_domians, 
                                    label_col="pred", 
                                    adj_nbr=True, 
                                    log=True)
    #Filter genes
    de_genes_info=de_genes_info[(de_genes_info["pvals_adj"]<0.05)]
    filtered_info=de_genes_info
    filtered_info=filtered_info[(filtered_info["pvals_adj"]<0.05) &
                                (filtered_info["in_out_group_ratio"]>min_in_out_group_ratio) &
                                (filtered_info["in_group_fraction"]>min_in_group_fraction) &
                                (filtered_info["fold_change"]>min_fold_change)]
    filtered_info=filtered_info.sort_values(by="in_group_fraction", ascending=False)
    filtered_info["target_dmain"]=target
    filtered_info["neighbors"]=str(nbr_domians)
    print("SVGs for domain ", str(target),":", filtered_info["genes"].tolist())
    
    #Plot refinedspatial domains
    color_self = clr.LinearSegmentedColormap.from_list('pink_green', ['#3AB370',"#EAE7CC","#FD1593"], N=256)
    for g in filtered_info["genes"].tolist():
        raw.obs["exp"]=raw.X[:,raw.var.index==g]
        ax=sc.pl.scatter(raw,alpha=1,x="y_pixel",y="x_pixel",color="exp",title=g,color_map=color_self,show=False,size=100000/raw.shape[0])
        ax.set_aspect('equal', 'box')
        ax.axes.invert_yaxis()
        plt.savefig("./sample_results/"+g+".png", dpi=600)
        plt.close()
    
    图片.png
    Identify Meta Gene
    #Use domain 2 as an example
    target=2
    meta_name, meta_exp=spg.find_meta_gene(input_adata=raw,
                        pred=raw.obs["pred"].tolist(),
                        target_domain=target,
                        start_gene="GFAP",
                        mean_diff=0,
                        early_stop=True,
                        max_iter=3,
                        use_raw=False)
    
    raw.obs["meta"]=meta_exp
    
    #Plot meta gene
    g="GFAP"
    raw.obs["exp"]=raw.X[:,raw.var.index==g]
    ax=sc.pl.scatter(raw,alpha=1,x="y_pixel",y="x_pixel",color="exp",title=g,color_map=color_self,show=False,size=100000/raw.shape[0])
    ax.set_aspect('equal', 'box')
    ax.axes.invert_yaxis()
    plt.savefig("./sample_results/"+g+".png", dpi=600)
    plt.close()
    
    raw.obs["exp"]=raw.obs["meta"]
    ax=sc.pl.scatter(raw,alpha=1,x="y_pixel",y="x_pixel",color="exp",title=meta_name,color_map=color_self,show=False,size=100000/raw.shape[0])
    ax.set_aspect('equal', 'box')
    ax.axes.invert_yaxis()
    plt.savefig("./sample_results/meta_gene.png", dpi=600)
    plt.close()
    
    图片.png
    Multiple tissue sections analysis
    adata1=sc.read("./data/Mouse_brain/MA1.h5ad")
    adata2=sc.read("./data/Mouse_brain/MP1.h5ad")
    img1=cv2.imread("./data/Mouse_brain/MA1_histology.tif")
    img2=cv2.imread("./data/Mouse_brain/MP1_histology.tif")
    b=49
    s=1
    x_pixel1=adata1.obs["x4"].tolist()
    y_pixel1=adata1.obs["x5"].tolist()
    adata1.obs["color"]=spg.extract_color(x_pixel=x_pixel1, y_pixel=y_pixel1, image=img1, beta=b)
    z_scale=np.max([np.std(x_pixel1), np.std(y_pixel1)])*s
    adata1.obs["z"]=(adata1.obs["color"]-np.mean(adata1.obs["color"]))/np.std(adata1.obs["color"])*z_scale
    
    x_pixel2=adata2.obs["x4"].tolist()
    y_pixel2=adata2.obs["x5"].tolist()
    adata2.obs["color"]=spg.extract_color(x_pixel=x_pixel2, y_pixel=y_pixel2, image=img2, beta=b)
    z_scale=np.max([np.std(x_pixel2), np.std(y_pixel2)])*s
    adata2.obs["z"]=(adata2.obs["color"]-np.mean(adata2.obs["color"]))/np.std(adata2.obs["color"])*z_scale
    del img1, img2
    from anndata import AnnData
    adata1.obs["x_pixel"]=x_pixel1
    adata1.obs["y_pixel"]=y_pixel1
    adata2.obs["x_pixel"]=x_pixel2-np.min(x_pixel2)+np.min(x_pixel1)
    adata2.obs["y_pixel"]=y_pixel2-np.min(y_pixel2)+np.max(y_pixel1)
    adata1.var_names_make_unique()
    adata2.var_names_make_unique()
    adata_all=AnnData.concatenate(adata1, adata2,join='inner',batch_key="dataset_batch",batch_categories=["0","1"])
    X=np.array([adata_all.obs["x_pixel"], adata_all.obs["y_pixel"], adata_all.obs["z"]]).T.astype(np.float32)
    adj=spg.pairwise_distance(X)
    sc.pp.normalize_per_cell(adata_all, min_counts=0)
    sc.pp.log1p(adata_all)
    p=0.5 
    #Find the l value given p
    l=spg.search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
    res=1.0
    seed=100
    random.seed(seed)
    torch.manual_seed(seed)
    np.random.seed(seed)
    clf=spg.SpaGCN()
    clf.set_l(l)
    clf.train(adata_all,adj,init_spa=True,init="louvain",res=res, tol=5e-3, lr=0.05, max_epochs=200)
    y_pred, prob=clf.predict()
    adata_all.obs["pred"]= y_pred
    adata_all.obs["pred"]=adata_all.obs["pred"].astype('category')
    colors_use=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#bcbd22', '#17becf', '#aec7e8', '#ffbb78', '#98df8a', '#ff9896', '#bec1d4', '#bb7784', '#0000ff', '#111010', '#FFFF00',   '#1f77b4', '#800080', '#959595', 
     '#7d87b9', '#bec1d4', '#d6bcc0', '#bb7784', '#8e063b', '#4a6fe3', '#8595e1', '#b5bbe3', '#e6afb9', '#e07b91', '#d33f6a', '#11c638', '#8dd593', '#c6dec7', '#ead3c6', '#f0b98d', '#ef9708', '#0fcfc0', '#9cded6', '#d5eae7', '#f3e1eb', '#f6c4e1', '#f79cd4']
    num_celltype=len(adata_all.obs["pred"].unique())
    adata_all.uns["pred_colors"]=list(colors_use[:num_celltype])
    ax=sc.pl.scatter(adata_all,alpha=1,x="y_pixel",y="x_pixel",color="pred",show=False,size=150000/adata_all.shape[0])
    ax.set_aspect('equal', 'box')
    ax.axes.invert_yaxis()
    plt.savefig("./sample_results/mouse_barin_muti_sections_domains.png", dpi=600)
    plt.close()
    
    图片.png

    生活很好,有你更好

    相关文章

      网友评论

        本文标题:10X空间转录组聚类分析回顾之SpaGCN(结合基因表达信息、空

        本文链接:https://www.haomeiwen.com/subject/ylvkhrtx.html