美文网首页SP-SingleCellSingle Cell RNA Sequencing Analysis
Giotto|| 空间表达数据分析工具箱

Giotto|| 空间表达数据分析工具箱

作者: 周运来就是我 | 来源:发表于2020-12-13 23:09 被阅读0次

    Seurat 新版教程:分析空间转录组数据
    单细胞转录组数据分析|| scanpy教程:空间转录组数据分析
    10X Visium:空间转录组样本制备到数据分析
    空间信息在空间转录组中的运用

    Giotto, a toolbox for integrative analysis and visualization of spatial expression data

    像Seurat一样,Giotto也是一位画家。乔托·迪·邦多纳(Giotto di Bondone 1266年-1337年),意大利画家、雕刻家与建筑师,被认为是意大利文艺复兴时期的开创者,被誉为“欧洲绘画之父”。在乔托的作品当中,可以看出他对于画作中真实空间的表达相当努力,有些壁画甚至还搭配了真实教堂内部的透视感来构图。这也许是Dries实验室选择这个名字作为其开发的空间表达数据分析工具箱的名字吧。

    我们知道Seurat和scanpy中均有分析空间转录组的函数,美中不足的是空间信息多是用来作为可视化的画板,没有得到很好的利用。今天我们就跟着Giotto的教程看看,空间表达数据可以做什么以及是如何做到的。

    首先,Giotto为常见的单细胞表达数据处理提供了一个灵活的框架,如:

    • 质量控制
    • 归一化
    • 降维
    • 聚类和细胞类型注释

    当然, 针对非单细胞分辨率的空间技术,如10X Visium ,Giotto实现了3种算法,通过整合已知基signatures 或单细胞RNAseq注释数据来估计不同细胞类型在空间中的富集(也可以理解为一种映射)。

    最重要的是,Giotto进一步利用空间信息形成空间网格和或空间接近网络,用于:

    • 识别空间特异基因
    • 提取连续的空间表达模式
    • 使用HMRF识别离散的空间域
    • 探索细胞类型/细胞类型空间相互作用富集或耗尽
    • 利用空间和配受体表达计算细胞空间相互作用
    • 发现相互作用改变基因(interaction changed genes,ICG):由于与相邻细胞相互作用而改变一种细胞类型表达的基因

    最后,Giotto提供了界面版工具来探索空间表达数据:Giotto Viewer

    具体每一步算法,还是建议把原论文打印出来慢慢研究。这里可以看做是一份demo报告,而且Giotto本身还在发展当中,如果真的用Giotto来分析自己的空间表达数据,开始看其官网:https://rubd.github.io/Giotto_site/index.html

    截至2020-12-13,官网已经准备好了15个案例教程,支持分析9种最先进的空间技术,包括原位杂交(seqFISH+, merFISH, osmFISH),测序(Slide-seq, Visium, STARmap)和基于成像的多路复用/蛋白质组学(CyCIF, MIBI, CODEX)。并分别提供了不同的数据集,以便用户学习使用。

    1. Install a Giotto environment (optional) # 学会R包安装
    2. Create a Giotto object # 理清楚Giotto数据结构
    3. Process and filter a Giotto object # 类比单细胞数据分析流程学习
    4. Dimension reduction
    5. Cluster cells or spots
    6. Identify differentially expressed genes
    7. Annotate clusters
    8. Cell-type enrichment or deconvolution per spot # 低分辨率单细胞技术分析
    9. Create a Spatial grid or Network # 空间数据分析
    10. Find genes with a spatially coherent gene expression pattern
    11. Identify genes that are spatially co-expressed
    12. Explore spatial domains with HMRF
    13. Calculate spatial cell-cell interaction enrichment
    14. Find cell-cell interaction changed genes (ICG)
    15. Identify enriched or depleted ligand-receptor interactions in hetero and homo-typic cell interactions
      16 Export Giotto results to use in Giotto viewer # 在线工具
    创建Giotto 对象

    安装与加载:

    library(devtools)  # if not installed: install.packages('devtools')
    library(remotes)  # if not installed: install.packages('remotes')
    remotes::install_github("RubD/Giotto")
    
    library(Seurat)  # 假设大家都会分析单细胞数据了,这个假设坏不坏,坏
    library(SeuratData)
    library(ggplot2)
    library(cowplot)
    library(dplyr)
    library(Giotto)
    library(patchwork)
    library(tidyverse)
    

    我们将使用Giotto自带的数据集mini_visium做为演示数据,这样数据较小,运行时间较快。在创建Giotto对象之前首先需要做一步初始化,这一点和metaCELL还是很像。

    ?createGiottoInstructions
    temp_dir = './mini_visium/'
    myinstructions = createGiottoInstructions(save_dir = temp_dir,
                                              save_plot = FALSE,
                                              show_plot = F,
                                              python_path = 'D:\\Program Files (x86)\\anconda\\python.exe')
    

    可以看到Giotto过程是依赖python的,这一点早期的Seurat差不多,这当然会有些不便之处,至少知道同一个环境下python在哪,需要哪些库需要安装。这是后话,在我们的展示中这都不是问题(有问题的地方我们可以跳过_)。下面我们来创建这个对象:

    expr_path = system.file("extdata", "visium_DG_expr.txt.gz", package = 'Giotto')
    loc_path = system.file("extdata", "visium_DG_locs.txt", package = 'Giotto')
    mini_visium <- createGiottoObject(raw_exprs = expr_path,
                                      spatial_locs = loc_path,
                                      instructions = myinstructions)
    

    这个会有一个提示:

    Consider to install these (optional) packages to run all possible Giotto commands for spatial analyses:  scran trendsceek SPARK multinet RTriangle FactoMiner
     Giotto does not automatically install all these packages as they are not absolutely required and this reduces the number of dependenciesWarning message:
    In createGiottoObject(raw_exprs = expr_path, spatial_locs = loc_path,  :
      module: community was not found with python path: D:\Program Files (x86)\anconda\python.exe
    

    经验告诉我,community 是用来分群的,我们不用louvain算法来计算距离,用leiden可以的。创建完对象我们当然要看看对象长什么样了。

    mini_visium
    An object of class giotto 
     634 genes across 624 samples.
     
    Steps and parameters used: 
     
    list()
    
     str(mini_visium,max.level = 2)
    Formal class 'giotto' [package "Giotto"] with 19 slots
      ..@ raw_exprs          :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
      ..@ norm_expr          : NULL
      ..@ norm_scaled_expr   : NULL
      ..@ custom_expr        : NULL
      ..@ spatial_locs       :Classes ‘data.table’ and 'data.frame':    624 obs. of  3 variables:
      .. ..- attr(*, ".internal.selfref")=<externalptr> 
      ..@ cell_metadata      :Classes ‘data.table’ and 'data.frame':    624 obs. of  1 variable:
      .. ..- attr(*, ".internal.selfref")=<externalptr> 
      ..@ gene_metadata      :Classes ‘data.table’ and 'data.frame':    634 obs. of  1 variable:
      .. ..- attr(*, ".internal.selfref")=<externalptr> 
      ..@ cell_ID            : chr [1:624] "AAAGGGATGTAGCAAG-1" "AAATGGCATGTCTTGT-1" "AAATGGTCAATGTGCC-1" "AAATTAACGGGTAGCT-1" ...
      ..@ gene_ID            : chr [1:634] "Gna12" "Ccnd2" "Btbd17" "Sox9" ...
      ..@ spatial_network    : NULL
      ..@ spatial_grid       : NULL
      ..@ spatial_enrichment : NULL
      ..@ dimension_reduction: NULL
      ..@ nn_network         : NULL
      ..@ images             : NULL
      ..@ parameters         : list()
      ..@ instructions       :List of 11
      ..@ offset_file        : NULL
      ..@ OS_platform        : chr "windows"
    

    可以看到也是一个S4对象包含了表达谱和空间信息,需要知道每部分存放的数据是什么。

    head(mini_visium@spatial_locs)
       sdimx sdimy            cell_ID
    1:  5477 -4125 AAAGGGATGTAGCAAG-1
    2:  5959 -2808 AAATGGCATGTCTTGT-1
    3:  4720 -5202 AAATGGTCAATGTGCC-1
    4:  5202 -5322 AAATTAACGGGTAGCT-1
    5:  4101 -4604 AACAACTGGTAGTTGC-1
    6:  5821 -3047 AACAGGAAATCGAATA-1
    
     head(mini_visium@cell_metadata)
                  cell_ID
    1: AAAGGGATGTAGCAAG-1
    2: AAATGGCATGTCTTGT-1
    3: AAATGGTCAATGTGCC-1
    4: AAATTAACGGGTAGCT-1
    5: AACAACTGGTAGTTGC-1
    6: AACAGGAAATCGAATA-1
    
     head(mini_visium@gene_metadata)
        gene_ID
    1:    Gna12
    2:    Ccnd2
    3:   Btbd17
    4:     Sox9
    5:     Sez6
    6: Serpinf1
    

    有了表达谱和空间信息,我们还需要图像数据,也加载进来。

    ## 1. read image
    png_path = system.file("extdata", "deg_image.png", package = 'Giotto')
    mg_img = magick::image_read(png_path)
    mg_img
    ## 2. test and modify image alignment
    mypl = spatPlot(mini_visium, return_plot = T, point_alpha = 0.8)
    orig_png = createGiottoImage(gobject = mini_visium, mg_object = mg_img, name = 'image',
                                 xmax_adj = 450, xmin_adj = 550,
                                 ymax_adj = 200, ymin_adj = 200)
    mypl_image = addGiottoImageToSpatPlot(mypl, orig_png)
    mypl_image
    ## 3. add images to Giotto object ##
    image_list = list(orig_png)
    mini_visium = addGiottoImage(gobject = mini_visium,
                                 images = image_list)
    showGiottoImageNames(mini_visium)
    The following images are available:  image 
    [1] "image"
    
    

    图像数据在:

    mini_visium@images
    

    至此,一个空间表达数据对象就构建完成了,我们可以类比Seurat或scanpy的数据结构来看看Giotto的数据结构,以便在下游分析的时候随心所欲地提取数据。

    表达数据标准分析

    空间表达数据首先是表达数据,所以之前我们学到的单细胞表达数据的分析方法框架是完全可以用的。所谓,降维聚类必知必会嘛。先查看数据分析执行基本的数据质控。

    # explore gene and cell distribution
    p1 <- filterDistributions(mini_visium, detection = 'genes') + ggtitle(" Gene  Distributions")
    p2 <-  filterDistributions(mini_visium, detection = 'cells')+ ggtitle(" Cell  Distributions") 
    p1 + p2 
    
    # 显示有多少基因和细胞在组合阈值时丢失。
    filterCombinations(mini_visium,
                       expression_thresholds = c(1),
                       gene_det_in_min_cells = c(20, 20, 50, 50),
                       min_det_genes_per_cell = c(100, 200, 100, 200))
    
    $results
       threshold gene_detected_in_min_cells min_detected_genes_per_cell combination removed_genes removed_cells
    1:         1                         20                         100      20-100             1             2
    2:         1                         20                         200      20-200             1            75
    3:         1                         50                         100      50-100            43             2
    4:         1                         50                         200      50-200            43            77
    
    $ggplot
    

    之后,我们进行数据过滤以及均一化等操作。

    # filter and normalize
    mini_visium <- filterGiotto(gobject = mini_visium,
                                expression_threshold = 1,
                                gene_det_in_min_cells = 50,
                                min_det_genes_per_cell = 100,
                                expression_values = c('raw'),
                                verbose = T)
    mini_visium <- normalizeGiotto(gobject = mini_visium, scalefactor = 6000, verbose = T)
    mini_visium <- addStatistics(gobject = mini_visium)
    
    
    mini_visium <- calculateHVG(gobject = mini_visium)
    

    看看基因和细胞的信息:

    head(mini_visium@cell_metadata)
                  cell_ID nr_genes perc_genes total_expr
    1: AAAGGGATGTAGCAAG-1      226   38.24027   992.0223
    2: AAATGGCATGTCTTGT-1      283   47.88494  1123.8751
    3: AAATGGTCAATGTGCC-1      247   41.79357  1024.8760
    4: AAATTAACGGGTAGCT-1      221   37.39425   971.3199
    5: AACAACTGGTAGTTGC-1      306   51.77665  1157.2925
    6: AACAGGAAATCGAATA-1      337   57.02200  1227.8457
    
    head(mini_visium@gene_metadata)
        gene_ID nr_cells perc_cells total_expr mean_expr mean_expr_det hvg
    1:    Gna12      501   80.54662  1984.1670 3.1899791      3.960413  no
    2:    Ccnd2      356   57.23473  1250.6225 2.0106471      3.512984  no
    3:   Btbd17      252   40.51447   783.0241 1.2588812      3.107238  no
    4:     Sox9      308   49.51768  1015.9613 1.6333783      3.298576  no
    5:     Sez6      504   81.02894  2179.5787 3.5041458      4.324561  no
    6: Serpinf1      105   16.88103   312.4149 0.5022748      2.975380  no
    
    # 可以直接用函数获得上述信息:
    pDataDT(mini_visium)
    fDataDT(mini_visium)
    
    

    此时的Giotto对象结构:

     591 genes across 622 samples.
     
    Steps and parameters used: 
     
    $`0_subset`
    cells removed genes removed 
                2            43 
    
    $`1_filter`
                             used expression values                       gene expression threshold            minimum # of genes detected per cell 
                                              "raw"                                             "1"                                           "100" 
    minimum times a gene is detected over all cells 
                                               "50" 
    
    $`2_normalize`
          normalization method normalized to library size                scalefactor             log-normalized                    logbase 
                    "standard"                      "yes"                     "6000"                      "yes"                        "2" 
                    log offset               genes scaled                cell scaled  if both, order of scaling 
                           "1"                      "yes"                      "yes"              "first_genes" 
    
    $`3_gene_stats`
    expression values used    detection_threshold 
              "normalized"                    "0" 
    
    $`4_cell_stats`
    expression values used    detection_threshold 
              "normalized"                    "0" 
    
    $`5_hvg`
                     method used           expression values1           expression values2           expression values3           reversed log scale 
                    "cov_groups"                 "normalized"                     "scaled"                     "custom"                      "FALSE" 
                         logbase         expression threshold number of expression groups        threshold for z-score        difference in variance 
                             "2"                          "0"                         "20"                        "1.5"                        "0.1" 
                   name for HVGs 
                           "hvg" 
    

    执行降维。

     ?runPCA
    mini_visium <- runPCA(gobject = mini_visium)
    screePlot(mini_visium, ncp = 30)
    

    每个PC的解释度


    提取PCA坐标,对于UMAP和tsne也可以类比,方便自己画图谱。

     head(mini_visium@dimension_reduction$cells$pca$pca$coordinates[1:4,1:4])
                            Dim.1      Dim.2       Dim.3       Dim.4
    AAAGGGATGTAGCAAG-1  0.9011678 -0.3335627 -0.14288217 -0.90345159
    AAATGGCATGTCTTGT-1  3.8096179  3.4784611  0.61075503  0.06560415
    AAATGGTCAATGTGCC-1 -8.0655096  0.4639553 -0.35425095 -1.02290334
    AAATTAACGGGTAGCT-1 -7.3829491  1.3098899  0.04309618 -1.28969309
    

    关于特征选择的PCA信息Giotto提供的并不像Seurat那么多。

    可视化降维:

    ppca <- plotPCA(gobject = mini_visium)
    mini_visium <- runUMAP(mini_visium, dimensions_to_use = 1:10)
    pumap<- plotUMAP(gobject = mini_visium)
    mini_visium <- runtSNE(mini_visium, dimensions_to_use = 1:10)
    ptsne <- plotTSNE(gobject = mini_visium)
    

    下面是标准分析的核心了:聚类发现细胞亚群

    mini_visium <- createNearestNetwork(gobject = mini_visium, dimensions_to_use = 1:10, k = 15)
    mini_visium <- doLeidenCluster(gobject = mini_visium, resolution = 0.4, n_iterations = 1000)
    
    # 查看聚类结果
    head(mini_visium@cell_metadata)
                  cell_ID nr_genes perc_genes total_expr leiden_clus
    1: AAAGGGATGTAGCAAG-1      226   38.24027   992.0223           3
    2: AAATGGCATGTCTTGT-1      283   47.88494  1123.8751           4
    3: AAATGGTCAATGTGCC-1      247   41.79357  1024.8760           2
    4: AAATTAACGGGTAGCT-1      221   37.39425   971.3199           2
    5: AACAACTGGTAGTTGC-1      306   51.77665  1157.2925           2
    6: AACAGGAAATCGAATA-1      337   57.02200  1227.8457           4
    

    查看聚类结果

    # visualize UMAP cluster results
    plotUMAP(gobject = mini_visium, cell_color = 'leiden_clus', show_NN_network = T, point_size = 2.5)
    
    # visualize UMAP and spatial results
    spatDimPlot(gobject = mini_visium, cell_color = 'leiden_clus', spat_point_shape = 'voronoi')
    
    # spatial voronoi plot with selected clusters
    p1 <- spatPlot(mini_visium, point_shape = 'voronoi', cell_color ='leiden_clus', select_cell_groups = c(1,2,3))
    
    # spatial voronoi plot without showing not selected clusters
    p2 <- spatPlot(mini_visium, point_shape = 'voronoi', cell_color ='leiden_clus', select_cell_groups = c(1,2,3), show_other_cells = F)
    
    # spatial voronoi plot without showing not selected cells, but showing the voronoi borders
    p3 <- spatPlot(mini_visium, point_shape = 'voronoi', cell_color ='leiden_clus', select_cell_groups = c(1,2,3,4), show_other_cells = F, 
             vor_border_color = 'black')
    
    
    
    p1+p2+ p3
    

    亚群间差异基因,提供三种差异分析分析方法: "scran", "gini", "mast",我们只使用其中的一种作为示例。

    ?findMarkers_one_vs_all
    mast_markers = findMarkers_one_vs_all(gobject = mini_visium,
                                           method = 'mast',
                                           expression_values = 'normalized',
                                           cluster_column = 'leiden_clus')
    
     head(mast_markers)
        genes   Pr(>Chisq)      coef     ci.hi     ci.lo          fdr     cluster ranking
    1:   Chgb 1.647262e-34 -2.103924 -1.749995 -2.457852 7.864001e-32 1_vs_others       1
    2: Ogfrl1 2.661252e-34 -2.073549 -1.724172 -2.422927 7.864001e-32 1_vs_others       2
    3:  Prkcg 1.066031e-33 -2.215784 -1.843982 -2.587586 2.100080e-31 1_vs_others       3
    4:  Fxyd7 1.788067e-31 -1.915721 -1.564124 -2.267317 2.641869e-29 1_vs_others       4
    5:  Camkv 6.043544e-31 -1.934645 -1.586670 -2.282620 6.140771e-29 1_vs_others       5
    6:   Dpp6 6.234285e-31 -2.225360 -1.883683 -2.567037 6.140771e-29 1_vs_others       6
    

    差异基因可视化

    topgenes_mast = mast_markers[, head(.SD, 2), by = 'cluster']$genes  # 这里的.SD 用的很溜啊
    topgenes_mast
    [1] "Chgb"    "Ogfrl1"  "Tcf7l2"  "Prkcd"   "Ddn"     "Ttr"     "Ngef"    "Arpp19"  "Prdm8"   "Cpne6"   "Spink8"  "Wfs1"    "Ccn2"    "Cplx3"  
    [15] "Ccdc153" "Rarres2"
    violinPlot(mini_visium, genes = topgenes_mast, cluster_column = 'leiden_clus',
               strip_text = 10, strip_position = 'right')
    
    topgenes_mast = mast_markers[, head(.SD, 6), by = 'cluster']$genes
    topgenes_mast
    
    plotMetaDataHeatmap(mini_visium, selected_genes = topgenes_mast,
                        metadata_cols = c('leiden_clus'))
    

    根据差异基因注释细胞类型,这当然是一个需要谨慎考虑的,这里为了方便演示,只是取一个基因名作为其细胞类型。

    clusters_cell_types = paste0(mast_markers[, head(.SD, 1), by = 'cluster']$genes,"_cells")
    clusters_cell_types
    names(clusters_cell_types) = 1:length(unique(mini_visium@cell_metadata$leiden_clus))
    mini_visium@cell_metadata
    mini_visium = annotateGiotto(gobject = mini_visium, annotation_vector = clusters_cell_types,
                                 cluster_column = 'leiden_clus', name = 'cell_types')
    
    # check new cell metadata
    pDataDT(mini_visium)
    
                   cell_ID nr_genes perc_genes total_expr leiden_clus   cell_types
      1: AAAGGGATGTAGCAAG-1      226   38.24027   992.0223           3    Ddn_cells
      2: AAATGGCATGTCTTGT-1      283   47.88494  1123.8751           4   Ngef_cells
      3: AAATGGTCAATGTGCC-1      247   41.79357  1024.8760           2 Tcf7l2_cells
      4: AAATTAACGGGTAGCT-1      221   37.39425   971.3199           2 Tcf7l2_cells
      5: AACAACTGGTAGTTGC-1      306   51.77665  1157.2925           2 Tcf7l2_cells
     ---                                                                           
    618: TTGTAATCCGTACTCG-1      313   52.96108  1179.7196           2 Tcf7l2_cells
    619: TTGTATCACACAGAAT-1      240   40.60914   999.6419           4   Ngef_cells
    620: TTGTCGTTCAGTTACC-1      219   37.05584   991.4468           3    Ddn_cells
    621: TTGTGGCCCTGACAGT-1      202   34.17936   924.0617           1   Chgb_cells
    622: TTGTTCAGTGTGCTAC-1      232   39.25550  1019.5997           6 Spink8_cells
    
    空间数据分析

    这里我们开始切入纳入空间位置之后的分析,其中之一就是,对低分辨率的技术,利用signatures 矩阵看空间中细胞类型的富集情况。可以说是与单细胞数据的联合分析,也可以说是把spot看作是局部bulk数据,对spot的反卷积(deconvolution),用以推断每个spot的细胞类型组成。

    Updating immune cell deconvolution for the spatial genomics era,ppt资料

    如果有signatures 可以用之计算每个空间位置基因标记富集分数。Giotto采用三种富集方法进行富集分析:

    • PAGE (Parametric Analysis of Gene Set Enrichment)

    在这种方法中,一组已知的细胞型特异性标记基因作为输入。目的是评估这些基因是否在每个点(spot)比其他点表达得更高。具体地说,对于每个点,我们定义一个与一组标记基因相对应的富集分数,如下所示。首先,对于整个基因组中的每个基因,我们利用该位点的表达值相对于所有位点的平均表达值来计算该基因的表达折叠变化。

    • Enrichment analysis based on rank of gene expression

    在这种方法中,不需要已知的标记基因列表。相反,将使用外部单细胞RNAseq数据集以及每个细胞的单元类型注释作为输入。

    • 利用超几何分布的富集分析

    这种方法也需要一组已知的细胞类型特异性标记基因作为输入,但它通过简单地使用超几何检验来评估富集程度。基于标记基因注释和基因表达值二值化,将所有基因划分为4个不重叠的类别,来构建列联表。后者是由每个位点前5%的表达基因决定的。根据列联表计算出一个p值。这里的富集分数被定义为-log10(p-value)。

    ## cell type signatures ##
    ## combination of all marker genes identified 
    sign_matrix_path = system.file("extdata", "sig_matrix.txt", package = 'Giotto')
    brain_sc_markers = data.table::fread(sign_matrix_path) # file don't exist in data folder
    sig_matrix = as.matrix(brain_sc_markers[,-1]); rownames(sig_matrix) = brain_sc_markers$Event
    sig_matrix[1:4,1:4]
           Astro_ependymal Cerebellum Cholinergic_monoaminergic Cortex_hippocampus_thalamus
    Cartpt                0          0                         1                           0
    Gm15469               0          0                         1                           0
    Pmch                  0          0                         1                           0
    Ly6a                  0          0                         0                           0 
    

    由于我们草率地找了个signatures 所以富集可能会不准,但是作为演示是可以的。

    mini_visium = runSpatialEnrich(mini_visium,
                                   sign_matrix = sig_matrix,
                                   enrich_method = 'PAGE') #default = 'PAGE'
    
    ## heatmap of enrichment versus annotation (e.g. clustering result)
    cell_types = colnames(sig_matrix)
    plotMetaDataCellsHeatmap(gobject = mini_visium,
                             metadata_cols = 'leiden_clus',
                             value_cols = cell_types,
                             spat_enr_names = 'PAGE',
                             x_text_size = 8, y_text_size = 8)
    

    查看富集分数

    enrichment_results = mini_visium@spatial_enrichment$PAGE
    head(enrichment_results)
                  cell_ID Astro_ependymal Cerebellum Cortex_hippocampus_thalamus Cortex_hippocampus di_mesencephalon_1 di_mesencephalon_2  Hindbrain
    1: AAAGGGATGTAGCAAG-1      -1.2800440 -0.5864989                 0.917972068        -0.05840974         -2.0124908         2.44315201  0.6100475
    2: AAATGGCATGTCTTGT-1      -1.1898333 -0.8364881                 1.146730319         2.94314936         -0.3084605        -0.07627204  0.2909754
    3: AAATGGTCAATGTGCC-1      -0.3672426 -0.7726506                -0.961627156        -3.13654976          2.3252251         0.04925334  1.0702687
    4: AAATTAACGGGTAGCT-1       0.2906555  0.6332150                 0.004221154        -4.66096648          2.3323610         1.60580233  1.5530766
    5: AACAACTGGTAGTTGC-1       2.1619438 -1.4829558                -2.718809976        -5.05188952          0.2268543        -1.41909263 -0.4450371
    6: AACAGGAAATCGAATA-1      -0.4275308 -0.2356467                 1.354997024         4.45432092         -1.2874840        -0.79001109  1.4877626
       Olfactory_bulb Oligo_dendrocyte Peptidergic   Vascular
    1:     -2.0100819       -0.4722108   1.0807732 -0.7906277
    2:      0.1148558       -1.8963290  -1.3098433  1.0049731
    3:     -0.1948654        0.4321902   1.2949379  2.3258985
    4:     -1.4450167        0.8517501   0.4966094  0.0358209
    5:     -0.5760702        2.3737223   1.1574013  0.4736639
    6:     -0.6069400       -1.3063133  -0.5210430 -0.8693017
    

    可视化结果

    enrich_cell_types = colnames(enrichment_results)
    enrich_cell_types = enrich_cell_types[enrich_cell_types != 'cell_ID']
    
    enrich_cell_types
    ## spatplot
    ?spatCellPlot
    spatCellPlot(gobject = mini_visium, spat_enr_names = 'PAGE',
                 cell_annotation_values = enrich_cell_types,show_image=T,show_legend=F,
                 cow_n_col = 3,coord_fix_ratio = NULL, point_size = 1)
    

    创建空间网格,看局部空间的异质性。

    mini_visium <- createSpatialGrid(gobject = mini_visium,
                                     sdimx_stepsize = 300,
                                     sdimy_stepsize = 300,
                                     minimum_padding = 50)
    showGrids(mini_visium)
    ?spatPlot
    spatPlot(gobject = mini_visium, show_grid = T, point_size = 1.5,
             show_image=T,group_by="cell_types")
    

    提取网格和相关数据点

    annotated_grid = annotateSpatialGrid(mini_visium)
    annotated_grid
        sdimx sdimy            cell_ID gr_x_loc gr_y_loc gr_loc
      1:  5477 -4125 AAAGGGATGTAGCAAG-1   gr_x_9   gr_y_5  gr_57
      2:  5959 -2808 AAATGGCATGTCTTGT-1  gr_x_11   gr_y_9 gr_107
      3:  4720 -5202 AAATGGTCAATGTGCC-1   gr_x_6   gr_y_2  gr_18
      4:  5202 -5322 AAATTAACGGGTAGCT-1   gr_x_8   gr_y_1   gr_8
      5:  4101 -4604 AACAACTGGTAGTTGC-1   gr_x_4   gr_y_4  gr_40
     ---                                                        
    618:  4996 -5442 TTGTAATCCGTACTCG-1   gr_x_7   gr_y_1   gr_7
    619:  6303 -2688 TTGTATCACACAGAAT-1  gr_x_12  gr_y_10 gr_120
    620:  5202 -3885 TTGTCGTTCAGTTACC-1   gr_x_8   gr_y_6  gr_68
    621:  5340 -3406 TTGTGGCCCTGACAGT-1   gr_x_8   gr_y_7  gr_80
    622:  5615 -4125 TTGTTCAGTGTGCTAC-1   gr_x_9   gr_y_5  gr_57
    
    annotated_grid_metadata = annotateSpatialGrid(mini_visium,
                                                  cluster_columns = c('leiden_clus', 'cell_types', 'nr_genes'))
    
    annotated_grid_metadata
    
                   cell_ID sdimx sdimy gr_x_loc gr_y_loc gr_loc leiden_clus   cell_types nr_genes
      1: AAAGGGATGTAGCAAG-1  5477 -4125   gr_x_9   gr_y_5  gr_57           3    Ddn_cells      226
      2: AAATGGCATGTCTTGT-1  5959 -2808  gr_x_11   gr_y_9 gr_107           4   Ngef_cells      283
      3: AAATGGTCAATGTGCC-1  4720 -5202   gr_x_6   gr_y_2  gr_18           2 Tcf7l2_cells      247
      4: AAATTAACGGGTAGCT-1  5202 -5322   gr_x_8   gr_y_1   gr_8           2 Tcf7l2_cells      221
      5: AACAACTGGTAGTTGC-1  4101 -4604   gr_x_4   gr_y_4  gr_40           2 Tcf7l2_cells      306
     ---                                                                                          
    618: TTGTAATCCGTACTCG-1  4996 -5442   gr_x_7   gr_y_1   gr_7           2 Tcf7l2_cells      313
    619: TTGTATCACACAGAAT-1  6303 -2688  gr_x_12  gr_y_10 gr_120           4   Ngef_cells      240
    620: TTGTCGTTCAGTTACC-1  5202 -3885   gr_x_8   gr_y_6  gr_68           3    Ddn_cells      219
    621: TTGTGGCCCTGACAGT-1  5340 -3406   gr_x_8   gr_y_7  gr_80           1   Chgb_cells      202
    622: TTGTTCAGTGTGCTAC-1  5615 -4125   gr_x_9   gr_y_5  gr_57           6 Spink8_cells      232
    

    创建空间网络,我们可以用Delaunay", "kNN" 两种方法。

    mini_visium = createSpatialNetwork(gobject = mini_visium, minimum_k = 2, maximum_distance_delaunay = 400)
    mini_visium = createSpatialNetwork(gobject = mini_visium, minimum_k = 2, method = 'kNN', k = 10)
    showNetworks(mini_visium)
    The following images are available:  Delaunay_network kNN_network 
    [1] "Delaunay_network" "kNN_network"     
    

    查看其中一个网络的信息,看到一个有from to 文件,这就可以用igraph来构建网络了啊。

    mini_visium@spatial_network$Delaunay_network$networkDT
                        from                 to sdimx_begin sdimy_begin sdimx_end sdimy_end distance      weight
       1: AAAGGGATGTAGCAAG-1 TCAAACAACCGCGTCG-1        5477       -4125      5340     -4125 137.0000 0.007299270
       2: AAAGGGATGTAGCAAG-1 ACGATCATACATAGAG-1        5477       -4125      5546     -4244 137.5573 0.007269700
       3: AAAGGGATGTAGCAAG-1 TTGTTCAGTGTGCTAC-1        5477       -4125      5615     -4125 138.0000 0.007246377
       4: AAAGGGATGTAGCAAG-1 ATCGACTCTTTCCGTT-1        5477       -4125      5408     -4005 138.4233 0.007224219
       5: AAAGGGATGTAGCAAG-1 GTAAGCGGGCAGTCAG-1        5477       -4125      5546     -4005 138.4233 0.007224219
      ---                                                                                                       
    1787: TTCAAGCCGAGCTGAG-1 TTGTATCACACAGAAT-1        6372       -2808      6303     -2688 138.4233 0.007224219
    1788: TTCGACGGGAAGGGCG-1 TTCGCACTCGCGTGCT-1        4239       -4125      4308     -4245 138.4233 0.007224219
    1789: TTCTTAGTGGCTCAGA-1 TTGTGGCCCTGACAGT-1        5408       -3287      5340     -3406 137.0584 0.007296161
    1790: TTCTTGTAACCTAATG-1 TTGGCTCGCATGAGAC-1        3620       -4005      3757     -4005 137.0000 0.007299270
    1791: TTGCACGGAGCAGCAC-1 TTGTCGTTCAGTTACC-1        5271       -3766      5202     -3885 137.5573 0.007269700
    

    、我们还是直接用Giotto来绘制吧,懒,哼。

    p1 <- spatPlot(gobject = mini_visium, show_network = T,
             network_color = 'blue', spatial_network_name = 'Delaunay_network',
             point_size = 2.5, cell_color = 'leiden_clus') + ggtitle("Delaunay_network")
    
    p2 <- spatPlot(gobject = mini_visium, show_network = T,
             network_color = 'blue', spatial_network_name = 'kNN_network',
             point_size = 2.5, cell_color = 'leiden_clus') + ggtitle("kNN_network")
    
    p1+ p2
    

    Giotto目前有五种不同的方法来识别空间连续表达模式(spatially coherent expression pattern)。三个先前发表的方法SpatialDE 、Trendsceek 和SPARK 可以与SpatialDE、Trendsceek和SPARK函数一起运行。Giotto介绍有两种新方法("kmeans", "rank")是基于对空间网络中相邻细胞二值化表达数据的统计。

    首先,对于每个基因,表达值采用kmeans聚类(k = 2)或简单的rank阈值设定(默认为30%)进行二值化,这是两种方法唯一的区别。接下来,根据相邻细胞间的二值化表达式值计算列联表,并将其用作Fisher精确检验的输入,以获得概率比估计值和p值。在这种情况下,如果一个基因通常在近端或邻近的细胞中被发现高表达,则被认为是空间基因。除了每个基因的胜率和p值外,还计算并提供了基因的平均表达量、高表达细胞数和hub细胞数。一个hub细胞被认为是一个高表达感兴趣的基因的细胞,它有多个高表达该基因的邻近细胞。用户可以利用这些特征进一步对具有不同特征的空间基因进行排序和探索。由函数 BinSpect (Binary Spatial extract)实现。

    ?binSpect
    km_spatialgenes = binSpect(mini_visium)
    km_spatialgenes
    
           genes       p.value   estimate   adj.p.value        score  av_expr high_expr
      1:      Ddn 1.249839e-282 19.1956483 7.386550e-280 1.246001e+04 6.031918       426
      2:    Shox2 6.120417e-223 24.2751032 7.234333e-221 1.242072e+04 3.984479       118
      3:     Hpca 8.836558e-253 13.1227650 2.611203e-250 7.616126e+03 6.964470       378
      4:     Zic1 5.861216e-226 12.2212691 8.659947e-224 6.338144e+03 4.284136       211
      5:     Mobp 6.841076e-234 10.7129452 1.347692e-231 5.751587e+03 7.660916       318
     ---                                                                                
    587:    Plod1  8.569300e-01  1.0134093  8.627694e-01 1.564695e-01 3.052250       229
    588:  Tmem119  8.710065e-01  0.9854772  8.754504e-01 1.361002e-01 3.126011       180
    589:      Fn1  8.757291e-01  0.9871367  8.787028e-01 1.309915e-01 3.300124       193
    590: Ndufa4l2  9.561515e-01  0.9885798  9.577720e-01 4.432688e-02 3.200996       115
    591:     Vsir  1.000000e+00  0.9989055  1.000000e+00 0.000000e+00 3.037544       177
    

    可视化基因

    spatGenePlot(mini_visium, expression_values = 'scaled',
                 genes = km_spatialgenes[1:4]$genes,
                 point_shape = 'border', point_border_stroke = 0.1,
                 show_network = F, network_color = 'lightgrey', point_size = 2.5,
                 cow_n_col = 2)
    
    rank_spatialgenes = binSpect(mini_visium, bin_method = 'rank')
    spatGenePlot(mini_visium, expression_values = 'scaled',
                 genes = rank_spatialgenes[1:4]$genes,
                 point_shape = 'border', point_border_stroke = 0.1,
                 show_network = F, network_color = 'lightgrey', point_size = 2.5,
                 cow_n_col = 2)
    
    

    计算空间相关分数(calculate spatial correlation scores)。地理学第一定律:Everything is related to everything else, but near things are more related to each other. (任何事物都是与其他事物相关的,只不过相近的事物关联更紧密)。这一定律告诉我们,空间位置对对象间的相关性是有决定作用的。了识别空间共表达基因(Spatial co-expression patterns)的稳健模式,detectSpatialCorGenes和clusterSpatialCorGenes 可以用于识别的单个空间基因。第一个函数通过网格平均法或k近邻法在空间上平滑基因表达,然后计算基因到基因的相关性(默认=皮尔逊)分数。

    ext_spatial_genes = km_spatialgenes[1:100]$genes
    ext_spatial_genes
    
           genes       p.value   estimate   adj.p.value        score  av_expr high_expr
      1:      Ddn 1.249839e-282 19.1956483 7.386550e-280 1.246001e+04 6.031918       426
      2:    Shox2 6.120417e-223 24.2751032 7.234333e-221 1.242072e+04 3.984479       118
      3:     Hpca 8.836558e-253 13.1227650 2.611203e-250 7.616126e+03 6.964470       378
      4:     Zic1 5.861216e-226 12.2212691 8.659947e-224 6.338144e+03 4.284136       211
      5:     Mobp 6.841076e-234 10.7129452 1.347692e-231 5.751587e+03 7.660916       318
     ---                                                                                
    587:    Plod1  8.569300e-01  1.0134093  8.627694e-01 1.564695e-01 3.052250       229
    588:  Tmem119  8.710065e-01  0.9854772  8.754504e-01 1.361002e-01 3.126011       180
    589:      Fn1  8.757291e-01  0.9871367  8.787028e-01 1.309915e-01 3.300124       193
    590: Ndufa4l2  9.561515e-01  0.9885798  9.577720e-01 4.432688e-02 3.200996       115
    591:     Vsir  1.000000e+00  0.9989055  1.000000e+00 0.000000e+00 3.037544       177
    # 检测空间相关的基因
    spat_cor_netw_DT = detectSpatialCorGenes(mini_visium,
                                             method = 'network', spatial_network_name = 'Delaunay_network',
                                             subset_genes = ext_spatial_genes)
    
    spat_cor_netw_DT$cor_DT
                 gene_ID      variable   spat_cor   expr_cor    cordiff spatrank exprrank rankdiff
        1: 2010300C02Rik 2010300C02Rik  1.0000000  1.0000000  0.0000000        1        1        0
        2: 2010300C02Rik         Icam5  0.8697253  0.5558625  0.3138629        2        5       -3
        3: 2010300C02Rik         Itpka  0.8631855  0.6118914  0.2512941        3        2        1
        4: 2010300C02Rik       Ppp1r1a  0.8290775  0.5070328  0.3220448        4        8       -4
        5: 2010300C02Rik           Ddn  0.8159786  0.5619251  0.2540535        5        4        1
       ---                                                                                        
     9996:          Zic1         Itpka -0.6782534 -0.4374862 -0.2407671       96       94        2
     9997:          Zic1           Vxn -0.7007696 -0.4780233 -0.2227463       97       96        1
     9998:          Zic1           Sst -0.7252043 -0.3920385 -0.3331659       98       84       14
     9999:          Zic1         Rprml -0.7293975 -0.5113883 -0.2180092       99      100       -1
    10000:          Zic1          Dkk3 -0.7297394 -0.4979835 -0.2317558      100       99        1
    

    相关基因的聚类与可视化。

    # 2. cluster correlation scores
    spat_cor_netw_DT = clusterSpatialCorGenes(spat_cor_netw_DT, name = 'spat_netw_clus', k = 8)
    heatmSpatialCorGenes(mini_visium, spatCorObject = spat_cor_netw_DT, use_clus_name = 'spat_netw_clus')
    

    可视化和过滤空间相关的基因

    top_netw_spat_cluster = showSpatialCorGenes(spat_cor_netw_DT, use_clus_name = 'spat_netw_clus',
                                                selected_clusters = 6, show_top_genes = 1)
    top_netw_spat_cluster
       gene_ID variable spat_cor expr_cor cordiff spatrank exprrank rankdiff clus
     1:  Amotl1   Amotl1        1        1       0        1        1        0    6
     2:    Grm4     Grm4        1        1       0        1        1        0    6
     3:    Lef1     Lef1        1        1       0        1        1        0    6
     4:    Nexn     Nexn        1        1       0        1        1        0    6
     5:   Ntng1    Ntng1        1        1       0        1        1        0    6
     6:    Patj     Patj        1        1       0        1        1        0    6
     7: Plekhg1  Plekhg1        1        1       0        1        1        0    6
     8:   Prkcd    Prkcd        1        1       0        1        1        0    6
     9:   Prkch    Prkch        1        1       0        1        1        0    6
    10:   Rab37    Rab37        1        1       0        1        1        0    6
    11:   Rgs16    Rgs16        1        1       0        1        1        0    6
    12:   Shox2    Shox2        1        1       0        1        1        0    6
    13: Slc17a6  Slc17a6        1        1       0        1        1        0    6
    14:  Synpo2   Synpo2        1        1       0        1        1        0    6
    15:  Tcf7l2   Tcf7l2        1        1       0        1        1        0    6
    16:   Tnnt1    Tnnt1        1        1       0        1        1        0    6
    17:    Vav3     Vav3        1        1       0        1        1        0    6
    18:   Vipr2    Vipr2        1        1       0        1        1        0    6
    19:   Zfhx3    Zfhx3        1        1       0        1        1        0    6
    20:    Zic1     Zic1        1        1       0        1        1        0    6
    
    cluster_genes_DT = showSpatialCorGenes(spat_cor_netw_DT, use_clus_name = 'spat_netw_clus', show_top_genes = 1)
    cluster_genes = cluster_genes_DT$clus; names(cluster_genes) = cluster_genes_DT$gene_ID
    cluster_genes
    mini_visium = createMetagenes(mini_visium, gene_clusters = cluster_genes, name = 'cluster_metagene')
    spatCellPlot(mini_visium,
                 spat_enr_names = 'cluster_metagene',
                 cell_annotation_values = netw_ranks$clusters,
                 point_size = 1.5, cow_n_col = 3,show_image=T,show_legend=F)
    

    如前所述,空间域(Spatial domain detection)用隐马尔可夫随机场(HMRF)模型识别 。简而言之,HMRF是一个基于图形的模型,它将每个细胞的状态推断为细胞的内在状态(从细胞自身的基因表达载体推断)和细胞的外在状态(基于细胞周边的状态分布)的联合概率。在我们的例子中状态的概念是空间域。邻域图定义了邻居细胞的影响程度,以及定义细胞间相互作用强度的参数beta。HMRF将每个cell分配给k个空间域中的一个(k由用户定义)。

    hmrf_folder = paste0(temp_dir,'/','11_HMRF/')
    if(!file.exists(hmrf_folder)) dir.create(hmrf_folder, recursive = T)
    
    ?doHMRF
    # perform hmrf
    my_spatial_genes = km_spatialgenes[1:100]$genes
    HMRF_spatial_genes = doHMRF(gobject = mini_visium,
                                expression_values = 'scaled',
                                spatial_genes = my_spatial_genes,
                                spatial_network_name = 'Delaunay_network',
                                k = 8,
                                betas = c(28,2,2),
                                output_folder = paste0(hmrf_folder, '/', 'Spatial_genes_brain/SG_top100_k8_scaled'))
    

    提示:

    expression_matrix.txt already exists at this location, will be overwritten 
    
     spatial_genes.txt already exists at this location, will be overwritten 
    
     spatial_network.txt already exists at this location, will be overwritten 
    
     spatial_cell_locations.txt already exists at this location, will be overwritten 
    [1] "D:\\Program Files (x86)\\anconda\\python.exe E:/software/R/R-4.0.2/library/Giotto/python/reader2.py -l \"./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_cell_locations.txt\" -g \"./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_genes.txt\" -n \"./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_network.txt\" -e \"./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/expression_matrix.txt\" -o \"./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/result.spatial.zscore\" -a test -k 8 -b 28 2 2 -t 1e-10 -z none -s 100 -i 100"
    

    可见内置的是python实现的,我们用system函数在R中运行它。

    system("D:\\Program Files (x86)\\anconda\\python.exe E:/software/R/R-4.0.2/library/Giotto/python/reader2.py -l \"
           ./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_cell_locations.txt\" -g \"
           ./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_genes.txt\" -n \"
           ./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_network.txt\" -e \"
           ./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/expression_matrix.txt\" -o \"
           ./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/result.spatial.zscore\" 
           -a test -k 8 -b 30 2 2 -t 1e-10 -z none -s 100 -i 100")
    
     127
    
    # check and select hmrf
    myviewHMRFresults2D <- edit(viewHMRFresults2D)
    environment(myviewHMRFresults2D) <- environment(viewHMRFresults2D)
    debug(myviewHMRFresults2D)
    ?viewHMRFresults2D
    viewHMRFresults2D(gobject = mini_visium,
                      HMRFoutput = HMRF_spatial_genes,
                      k = 8, betas_to_view = 2,
                      point_size = 2)
    
    
    for(i in seq(28, 30, by = 2)) {
        myviewHMRFresults2D(gobject = mini_visium,
                          HMRFoutput = HMRF_spatial_genes,
                          k = 8, betas_to_view = i,
                          point_size = 2)
    }
    
    
    myaddHMRF <- edit(addHMRF)
    environment(myaddHMRF) <- environment(addHMRF)
    mini_visium = myaddHMRF(gobject = mini_visium,
                          HMRFoutput = HMRF_spatial_genes,
                          k = 8, betas_to_add = c(28),
                          hmrf_name = 'HMRF')
    
    mini_visium@cell_metadata
    giotto_colors = getDistinctColors(8)
    giotto_colors
    names(giotto_colors) = 1:8
    head(mini_visium@cell_metadata)
    spatPlot(gobject = mini_visium, cell_color = 'HMRF_k8_b.28',
             point_size = 3, coord_fix_ratio = 1, cell_color_code = giotto_colors)
    

    识别细胞与细胞的相互作用改变了基因( cell-to-cell Interaction Changed Genes,ICG),即由于接近其他细胞类型而差异表达的基因。为了无偏性地识别与特定细胞型相互作用相关的所有潜在基因表达变化,Giotto实施了4项差异基因分析来识别这种相互作用改变的基因(ICG),包括t检验、limma检验、Wilcoxon秩和检验和空间排列检验。对于每一种细胞类型,我们将注释的细胞分成两个互补的亚组,其中一个亚组包含邻近另一种特定细胞类型的细胞的子集。这些组间差异表达的基因是通过上述每一种统计检验来鉴别的。若要调整多个假设检验,则通过重新随机化细胞类型中的细胞来创建背景空分布。这种分析可以用函数findInteractionChangedGenes或findICG实现。

    gene_metadata = fDataDT(mini_visium)
    high_expressed_genes = gene_metadata[mean_expr_det > 4]$gene_ID
    # high_expressed_genes
    ## identify genes that are associated with proximity to other cell types
    ?findICG
    ICGscoresHighGenes =  findICG(gobject = mini_visium,
                                  selected_genes = high_expressed_genes,
                                  spatial_network_name = 'Delaunay_network',
                                  cluster_column = 'cell_types',
                                  diff_test = 'permutation',
                                  adjust_method = 'fdr',
                                  nr_permutations = 500,
                                  do_parallel = T, cores = 2)
    
    colnames(ICGscoresHighGenes$CPGscores)
     [1] "genes"         "sel"           "other"         "log2fc"        "diff"          "p.value"       "p.adj"         "perm_sel"      "perm_other"   
    [10] "perm_log2fc"   "perm_diff"     "cell_type"     "int_cell_type" "nr_select"     "int_nr_select" "nr_other"      "int_nr_other"  "unif_int"     
    [19] "spec_int"      "type_int"  
    

    可视化细胞接近基因得分

    map(c( "cell_barplot", "cell-cell",
          "dotplot"),function(x){
              plotCellProximityGenes(mini_visium, cpgObject = ICGscoresHighGenes, method = x)
          }) %>%  cowplot::plot_grid(plotlist = .,ncol=3)
    

    过滤 Interaction Changed Gene scores

    ICGscoresFilt = filterICG(ICGscoresHighGenes,
                              min_cells = 2, min_int_cells = 2, min_fdr = 0.1,
                              min_spat_diff = 0.1, min_log2_fc = 0.1, min_zscore = 1)
    

    过滤并可视化ICG基因

    ## filter genes
    ?filterICG
    ICGscoresFilt = filterICG(ICGscoresHighGenes,
                              min_cells = 2, min_int_cells = 2, min_fdr = 0.1,
                              min_spat_diff = 0.1, min_log2_fc = 0.1, min_zscore = 1)
    
    ICGscoresFilt$CPGscores$other
    ## visualize subset of interaction changed genes (ICGs)
    # random subset
    ICG_genes = topgenes_mast = mast_markers[, head(.SD, 1), by = 'cluster']$genes # c('Cpne2', 'Scg3', 'Cmtm3', 'Cplx1', 'Lingo1')
    ICG_genes_types = clusters_cell_types
    names(ICG_genes) = ICG_genes_types
    
    ?plotICG
    plotICG(gobject = mini_visium,
            cpgObject = ICGscoresHighGenes,
            source_type = 'Tcf7l2_cells',
            source_markers = c("Scg3","Nptx1","Stx1a"),
            ICG_genes = ICG_genes)
    
    

    基因表达在细胞附近发生的变化可以用几种方法来考量,配体受体对(Spatially informed ligand-receptor pairing)是其中的一个。为了研究细胞在微环境中是如何交流的,Giotto可以从现有的数据库中整合已知的配体-受体信息。通过计算这类基因对在两种细胞类型的相邻细胞中空间共表达情况,它可以估计出在两种细胞类型的相互作用,哪些配体-受体对可能更多(或更少)用于通信。这是在spatCellCellcom函数中实现的,更具体地说,对于每对配体-受体,计算出每对细胞类型的细胞-细胞通信评分S。

    我们读入Giotto自带的配体受体对数据,在您自己的研究中可以根据需要准备或收集目的配受体。

    LR_data = data.table::fread(system.file("extdata", "mouse_ligand_receptors.txt", package = 'Giotto'))
    LR_data
    
    
         mouseLigand mouseReceptor
       1:         A2m          Lrp1
       2:       Aanat        Mtnr1a
       3:       Aanat        Mtnr1b
       4:      Adam12         Itga9
       5:      Adam12         Itgb1
      ---                          
    1063:       Wnt7b          Lrp5
    1064:        Xcl1          Xcr1
    1065:        Xcl2          Xcr1
    1066:        Yars         Cxcr1
    1067:         Zp3         Mertk
    

    先通过基因表达数据对细胞-细胞间的交流进行评分,这里没有用到空间数据,该方法用于模拟基于scRNAseq的CCI分析。

    # 基因匹配
    LR_data[, ligand_det := ifelse(mouseLigand %in% mini_visium@gene_ID, T, F)]
    LR_data[, receptor_det := ifelse(mouseReceptor %in% mini_visium@gene_ID, T, F)]
    LR_data_det = LR_data[ligand_det == T & receptor_det == T]
    select_ligands = LR_data_det$mouseLigand
    select_receptors = LR_data_det$mouseReceptor
    
    
    ## get statistical significance of gene pair expression changes based on expression ##
    expr_only_scores = exprCellCellcom(gobject = mini_visium,
                                       cluster_column = 'cell_types',
                                       random_iter = 500,
                                       gene_set_1 = select_ligands,
                                       gene_set_2 = select_receptors)
    

    用空间数据对细胞通讯评分。

    spatial_all_scores = spatCellCellcom(mini_visium,
                                         spatial_network_name = 'Delaunay_network',
                                         cluster_column = 'cell_types',
                                         random_iter = 500,
                                         gene_set_1 = select_ligands,
                                         gene_set_2 = select_receptors,
                                         adjust_method = 'fdr',
                                         do_parallel = T,
                                         cores = 4,
                                         verbose = 'none')
    
    spatial_all_scores[1:2,]
         LR_comb lig_cell_type lig_expr ligand rec_cell_type rec_expr receptor  LR_expr lig_nr rec_nr rand_expr    av_diff     log2fc pvalue
    1: Npy-Npy2r   Prdm8_cells 4.163227    Npy  Spink8_cells 2.721694    Npy2r 6.884920      8      5  5.202572  1.6823484  0.3975514  0.052
    2: Npy-Npy2r    Ngef_cells 3.345273    Npy   Prdm8_cells 2.803618    Npy2r 6.148892      5      3  6.866103 -0.7172109 -0.1567514  0.520
                    LR_cell_comb p.adj          PI
    1: Prdm8_cells--Spink8_cells 0.104  0.39077973
    2:   Ngef_cells--Prdm8_cells 0.688 -0.02545824
    
    ## select top LR ##
    selected_spat = spatial_all_scores[p.adj <= 0.5 & abs(log2fc) > 0.1 & lig_nr >= 2 & rec_nr >= 2]
    selected_spat
    data.table::setorder(selected_spat, -PI)
    
    top_LR_ints = unique(selected_spat[order(-abs(PI))]$LR_comb)[1:33]
    top_LR_cell_ints = unique(selected_spat[order(-abs(PI))]$LR_cell_comb)[1:33]
    ?plotCCcomHeatmap
    plotCCcomHeatmap(gobject = mini_visium,
                     comScores = spatial_all_scores,
                     selected_LR = top_LR_ints,
                     selected_cell_LR = top_LR_cell_ints,
                     show = 'LR_expr')
    
    plotCCcomDotplot(gobject = mini_visium,
                     comScores = spatial_all_scores,
                     selected_LR = top_LR_ints,
                     selected_cell_LR = top_LR_cell_ints,
                     cluster_on = 'PI')
    

    结合空间数据的细胞通讯。

    ?combCCcom
    comb_comm = combCCcom(spatialCC = spatial_all_scores,
                          exprCC = expr_only_scores)
    
    for top  1  expression ranks, you recover  0 % of the highest spatial rank 
    for top  10  expression ranks, you recover  29.55 % of the highest spatial rank 
    for top  20  expression ranks, you recover  57.95 % of the highest spatial rank 
    
    comb_comm[1:2]
         LR_comb                 LR_cell_comb lig_cell_type rec_cell_type ligand receptor lig_expr_spat rec_expr_spat LR_expr_spat lig_nr_spat rec_nr_spat
    1: Npy-Npy2r Ccdc153_cells--Ccdc153_cells Ccdc153_cells Ccdc153_cells    Npy    Npy2r      3.764760     0.9497822     4.714542          11          11
    2: Npy-Npy2r    Ccdc153_cells--Chgb_cells Ccdc153_cells    Chgb_cells    Npy    Npy2r      3.472081     0.4239559     3.896037          17          11
       rand_expr_spat av_diff_spat log2fc_spat pvalue_spat p.adj_spat    PI_spat lig_expr  rec_expr  LR_expr lig_nr rec_nr rand_expr   av_diff    log2fc
    1:       4.429262   0.28527998  0.08812256       0.102      0.204 0.06083716 3.552652 0.8706337 4.423286    106    106  3.334204 1.0890821 0.3973956
    2:       3.861855   0.03418194  0.01239384       0.844      0.844 0.00091290 3.552652 0.2954424 3.848095    106    119  3.343469 0.5046257 0.1972939
       pvalue p.adj         PI LR_expr_rnk LR_spat_rnk exprPI_rnk spatPI_rnk
    1:  0.168 0.336 0.18823066          12           9         19          8
    2:  0.404 0.808 0.01826718          19          22         25         21
    

    从张大表中我们可以提取很多可以可视化的信息。

    # top differential activity levels for ligand receptor pairs
    plotRankSpatvsExpr(gobject = mini_visium,
                       comb_comm,
                       expr_rnk_column = 'exprPI_rnk',
                       spat_rnk_column = 'spatPI_rnk',
                       midpoint = 10)
    
    ## predict maximum differential activity
    plotRecovery(gobject = mini_visium,
                 comb_comm,
                 expr_rnk_column = 'exprPI_rnk',
                 spat_rnk_column = 'spatPI_rnk',
                 ground_truth = 'spatial')
    percentage explained =  0.6022727
    

    我们自己画画二者之间的关系。

    p1 <-  ggplot(comb_comm,aes(LR_spat_rnk ,LR_expr_rnk))+ geom_point() + theme_bw()
    
    p2 <-  ggplot(comb_comm,aes(exprPI_rnk  ,spatPI_rnk))+ geom_point()+ theme_bw()
    p1+p2
    

    并用我们的igraph绘制网络图。

    comb_commdf$SOURCE        <- str_split(comb_commdf$LR_cell_comb,"--",simplify = T)[,1] 
    comb_commdf$TARGET       <- str_split(comb_commdf$LR_cell_comb,"--",simplify = T)[,2] 
    
    comb_commdf
    library(igraph)
    ?graph_from_data_frame
    colnames(graph_from_data_frame)
    ?recode
    comb_commdf <- comb_commdf %>% relocate(c("SOURCE",            "TARGET","LR_expr_rnk", "LR_spat_rnk", "exprPI_rnk", "spatPI_rnk"),.before ="LR_comb")
    
    net<- graph_from_data_frame(comb_commdf)
    
    
    plot(net)
    allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB",
                "#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00",
                "#808000","#FF00FF","#FA8072","#7B68EE","#9400D3",
                "#800080","#A0522D","#D2B48C","#D2691E","#87CEEB",
                "#40E0D0","#5F9EA0","#FF1493",
                "#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4",
                "#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347",
                "#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")
    
    
    karate_groups <- cluster_optimal(net)
    coords <- layout_in_circle(net, order =
                                 order(membership(karate_groups)))  # 设置网络布局
    E(net)$width  <- E(net)$LR_expr_rnk /E(net)$LR_spat_rnk  # 边点权重(粗细)
    
    
    net2 <- net  # 复制一份备用
    
    for (i in 1: length(unique(comb_commdf$SOURCE)) ){
      E(net)[map(unique(comb_commdf$SOURCE),function(x) {
        get.edge.ids(net,vp = c(unique(comb_commdf$SOURCE)[i],x))
      })%>% unlist()]$color <- allcolour[i]
    }  # 这波操作谁有更好的解决方案? 
    
    plot(net, edge.arrow.size=.1, 
         edge.curved=0.2,
         vertex.color=allcolour,
         vertex.frame.color="#555555",
         vertex.label.color="black",
         layout = coords,
         vertex.label.cex=.7) 
    

    可见,模型建好之后,可视化的工作就是提取相关的信息就可以了。作为本次演示的结尾,我们看看如果有目标cell、spot如何画出它的边界,毕竟还是想以空间结尾。

    nb_annot = findNetworkNeighbors(mini_visium,
                                    spatial_network_name = 'Delaunay_network',
                                    source_cell_ids = unique(mini_visium@spatial_network$kNN_network$networkDT$to)[sample(500,30)],
                                    )
    
    mini_visium = addCellMetadata(mini_visium, new_metadata = nb_annot, by_column = T, column_cell_ID = 'cell_ID')
    ?spatPlot
    spatPlot(mini_visium, point_shape = 'voronoi', cell_color ='nb_cells',show_image=T,
             cell_color_code = c(source = 'blue', target = 'red', both = 'yellow', others = 'lightgrey' , neighbor='green'))
    
    @Article{,
      author = {Ruben Dries and Qian Zhu and Rui Dong and Chee-Huat Linus Eng and Huipeng Li and Kan Liu and Yuntian Fu and Tianxiao Zhao and Arpan Sarkar and Rani E George and Nico Pierson and Long Cai and Guo-Cheng Yuan},
      title = {Giotto, a toolbox for integrative analysis and visualization of spatial expression data},
      journal = {bioRxiv},
      year = {2020},
      doi = {10.1101/701680},
      url = {https://doi.org/10.1101/701680},
    }
    

    Updating immune cell deconvolution for the spatial genomics era
    https://www.drieslab.com/
    https://rubd.github.io/Giotto/

    相关文章

      网友评论

        本文标题:Giotto|| 空间表达数据分析工具箱

        本文链接:https://www.haomeiwen.com/subject/aubugktx.html