MISTy：空间转录组细胞互作分析方法

作者: Hayley笔记 | 来源:发表于2022-08-16 16:30 被阅读0次

1. 简介

MISTy主要有四个功能：

算法：Multi-view framework to dissect effects related to CCI (剖析与细胞-细胞互作相关影响的多视角框架)

优点：1. 不需要细胞类型注释；2. 利用完整的表达谱。

缺点：提取的互作不能直接视为因果关系。

⚠️MISTy不仅可以做细胞-细胞之间的dependency，基因和基因之间，甚至细胞和通路之间的dependency都可以做。

2. 数据准备

# MISTy
library(mistyR)
library(future)

# data manipulation
library(dplyr)
library(purrr)
library(distances)

# plotting
library(ggplot2)

plan(multisession)

以下演示使用了合成的标准演示数据集synthetic。该数据集是一个包含 10 个 tibbles 的列表，每个 tibbles 代表从四种单元格类型的随机布局和 100×100 网格上的空白空间生成的数据。

该数据是通过模拟一个专注于signaling events的二维cellular automata model生成的。该模型模拟了 11 种分子的产生、扩散、降解和相互作用。请注意，数据集仅包含非空白空间的模拟测量值。

data("synthetic")

ggplot(synthetic[[1]], aes(x = col, y = row, color = type)) +
  geom_point(shape = 15, size = 0.7) +
  scale_color_manual(values = c("#e9eed3", "#dcc38d", "#c9e2ad", "#a6bab6")) +
  theme_void()

四种颜色的格子代表四种细胞类型

str(synthetic[[1]], give.attr = FALSE)
#> spec_tbl_df [4,205 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#>  $ row  : num [1:4205] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ col  : num [1:4205] 100 11 13 14 15 20 23 24 26 32 ...
#>  $ ECM  : num [1:4205] 0.0385 0.0327 0.1444 0.387 0.1635 ...
#>  $ ligA : num [1:4205] 0.834 0.119 0.525 0.269 0.195 ...
#>  $ ligB : num [1:4205] 0.0157 0.0104 0.014 0.0367 0.1176 ...
#>  $ ligC : num [1:4205] 0.236 0.804 0.334 0.502 0.232 ...
#>  $ ligD : num [1:4205] 1.183 0.101 0.434 0.241 0.203 ...
#>  $ protE: num [1:4205] 1.18 0 1.67 0 0 ...
#>  $ protF: num [1:4205] 2.547 0.386 1.614 0.913 0.162 ...
#>  $ prodA: num [1:4205] 0.382 0 0.472 0 0 ...
#>  $ prodB: num [1:4205] 0 0 0 0 0.16 ...
#>  $ prodC: num [1:4205] 0 0.536 0 0.418 0 ...
#>  $ prodD: num [1:4205] 0.588 0 0.379 0 0 ...
#>  $ type : chr [1:4205] "CT1" "CT2" "CT1" "CT2" ...

3. View composition

mistyR 工作流程总是从定义一个内部视图 (create_initial_view()) 开始，包含了在每个感兴趣的细胞中检测作为建模目标的marker的测量值。对于synthetic数据集中的第一个样本，我们选择除了所有可用细胞的配体(ligA-D)之外的所有marker。

expr <- synthetic[[1]] %>% select(-c(row, col, type, starts_with("lig")))
# head(expr)
##  A tibble: 6 × 7
#      ECM protE protF prodA prodB prodC prodD
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0.0385  1.18 2.55  0.382 0     0     0.588
# 2 0.0327  0    0.386 0     0     0.536 0    
# 3 0.144   1.67 1.61  0.472 0     0     0.379
# 4 0.387   0    0.913 0     0     0.418 0    
# 5 0.163   0    0.162 0     0.160 0     0    
# 6 0.203   0    0.603 0     0     0.555 0 

misty.intra <- create_initial_view(expr)

summary(misty.intra)
#>                Length Class  Mode     
#> intraview      2      -none- list     
#> misty.uniqueid 1      -none- character
summary(misty.intra$intraview)
#>        Length Class       Mode     
#> abbrev 1      -none-      character
#> data   7      spec_tbl_df list

从内在视图（intraview）的角度，mistyR 将模拟每个marker的表达作为细胞内其他marker的表达的函数。我们有兴趣探索来自互补的不同空间背景的marker表达，即，可区分并有助于解释marker的整体表达。

mistyR 包含两个默认helper functions，用于计算和添加考虑到数据spatial context的视图：add_juxtaview() 和 add_paraview()。juxtaview表示局部空间视图，并捕获每个单元的直接邻域内的内部视图中可用的所有marker的表达。 paraview 捕获边界组织结构内视图中所有可用marker的表达，其中importance of the influence与两个细胞之间距离的倒数成正比。要在视图合成中添加一个 paraview，我们首先需要来自内部视图的每个cell的位置信息。使用这些信息，我们可以创建一个重要半径为 10 的 paraview 并将其添加到视图组合中。

pos <- synthetic[[1]] %>% select(row, col)
pos
##  A tibble: 4,205 × 2
#      row   col
#    <dbl> <dbl>
#  1     1   100
#  2     1    11
#  3     1    13
#  4     1    14
#  5     1    15
#  6     1    20
#  7     1    23
#  8     1    24
#  9     1    26
# 10     1    32
##  … with 4,195 more rows

misty.views <- misty.intra %>% add_paraview(pos, l = 10)
#> 
#> Generating paraview

summary(misty.views)
#>                Length Class  Mode     
#> intraview      2      -none- list     
#> misty.uniqueid 1      -none- character
#> paraview.10    2      -none- list

View(misty.views)

注意 misty.views[["paraview.10"]][["data"]] 和 misty.views[["intraview"]][["data"]] 的不同

当样本中有大量细胞时，juxtaview和paraview的计算可能是计算密集型的。因此需要设置future::plan() 并行运行。计算paraview所需的计算时间也可以通过approximation显着减少。有关详细信息，请参阅此函数的文档 (help(add_paraview))。

此外，也可以从外部资源（data.frame、tibble）创建其他相关和自定义视图（create_view()）并将其添加到视图组合中。数据应按内部视图中的顺序(每个细胞一行)。例如，我们可以创建一个视图来捕获每个cell的 10 个最近邻居的平均表达式。

# find the 10 nearest neighbors
neighbors <- nearest_neighbor_search(distances(as.matrix(pos)), k = 11)[-1, ]
dim(neighbors)
# [1]   10 4205
neighbors[,1:6]
#         1    2    3    4   5    6
#  [1,]  46  488    4  490   4  494
#  [2,] 489    3  490    5 490  493
#  [3,] 533  935    2    3 938  943
#  [4,] 934   38  936  937   3  495
#  [5,] 985  933    5  938 939  944
#  [6,] 532  936  937  936 937  492
#  [7,] 984  979  935  939 491  945
#  [8,]  45    4  938    2 940  941
#  [9,] 983  490 1420  491 936 1421
# [10,]  44 1420  488 1420 492    7

# calculate the mean expression of the nearest neighbors for all markers
# for each cell in expr
nnexpr <- seq_len(nrow(expr)) %>%
  map_dfr(~ expr %>%
    slice(neighbors[, .x]) %>%
    colMeans())

nn.view <- create_view("nearest", nnexpr, "nn")

nn.view
#> $nearest
#> $nearest$abbrev
#> [1] "nn"
#> 
#> $nearest$data
#> # A tibble: 4,205 × 7
#>      ECM protE protF  prodA  prodB  prodC  prodD
#>    <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#>  1 0.169 0.337 1.07  0.120  0.0138 0.146  0.165 
#>  2 0.346 0.676 0.549 0.0969 0.0140 0.190  0.0766
#>  3 0.219 0.304 0.495 0.0496 0.0288 0.236  0.0387
#>  4 0.238 0.607 0.651 0.132  0.0288 0.0954 0.122 
#>  5 0.313 0.688 0.835 0.166  0.0297 0.0837 0.173 
#>  6 0.527 0.743 0.616 0.0722 0.0184 0.135  0.0964
#>  7 0.278 0.399 0.501 0.0413 0.0632 0.160  0.0604
#>  8 0.266 0.537 0.624 0.0738 0.0463 0.154  0.117 
#>  9 0.356 0.564 0.565 0.0696 0.0415 0.208  0.106 
#> 10 0.625 0.863 0.458 0.0350 0.0823 0.230  0.0576
#> # … with 4,195 more rows

可以将创建的视图（add_views()）一个一个地添加到现有的视图组合中，或者以列表的形式添加。创建和添加自定义视图到合成的其他示例。

extended.views <- misty.views %>% add_views(nn.view)

summary(extended.views)
#>                Length Class  Mode     
#> intraview      2      -none- list     
#> misty.uniqueid 1      -none- character
#> paraview.10    2      -none- list     
#> nearest        2      -none- list

也可以通过向 remove_views() 提供一个或多个视图名称来从组合中删除视图。但是这个方法无法删除 intraview 和 misty.uniqueid。

extended.views %>%
  remove_views("nearest") %>%
  summary()
#>                Length Class  Mode     
#> intraview      2      -none- list     
#> misty.uniqueid 1      -none- character
#> paraview.10    2      -none- list

extended.views %>%
  remove_views("intraview") %>%
  summary()
#>                Length Class  Mode     
#> intraview      2      -none- list     
#> misty.uniqueid 1      -none- character
#> paraview.10    2      -none- list     
#> nearest        2      -none- list

4. Model training

视图组合创建好之后，就可以使用 run_misty() 函数进行model training。默认情况下，针对每个视图的内部视图中可用的每个marker单独训练模型。模型训练的结果将存储在名为“results”的文件夹中。

misty.views %>% run_misty()

# Training models
#  Progress: ───────────────────────────────────────── 100%
# [1] "/tmp/results"

我们用于synthetic数据集中第一个样本的工作流程可以轻松扩展以应用于所有 10 个样本，以完全重现已发表文献中的结果。每个样本的结果将存储在“results”文件夹的子文件夹中。

result.folders <- synthetic %>% imap_chr(function(sample, name) {
  sample.expr <- sample %>% select(-c(row, col, type, starts_with("lig")))
  sample.pos <- sample %>% select(row, col)
  
  create_initial_view(sample.expr) %>% add_paraview(sample.pos, l = 10) %>%
    run_misty(results.folder = paste0("results", .Platform$file.sep, name))
})
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models
#> 
#> Generating paraview
#> 
#> Training models

result.folders
#>                                                                 synthetic1 
#>  "/tmp/results/synthetic1" 
#>                                                                synthetic10 
#> "/tmp/results/synthetic10" 
#>                                                                 synthetic2 
#>  "/tmp/results/synthetic2" 
#>                                                                 synthetic3 
#>  "/tmp/results/synthetic3" 
#>                                                                 synthetic4 
#>  "/tmp/results/synthetic4" 
#>                                                                 synthetic5 
#>  "/tmp/results/synthetic5" 
#>                                                                 synthetic6 
#>  "/tmp/results/synthetic6" 
#>                                                                 synthetic7 
#>  "/tmp/results/synthetic7" 
#>                                                                 synthetic8 
#>  "/tmp/results/synthetic8" 
#>                                                                 synthetic9 
#>  "/tmp/results/synthetic9"

需要注意的是，默认情况下，mistyR 会缓存计算出的视图和经过训练的模型，这样在工作流重复运行的情况下，它们将被检索而不是重新计算，从而节省了大量的计算时间。但是，生成的缓存文件的大小可能很大。因此，可以使用缓存文件的函数，例如 run_misty()中名为 cached 的参数，可以设置为 FALSE。此外，函数 clear_cache() 提供了删除缓存文件的方法。

5. Result processing

对于每个分析的样本，原始 mistyR 结果存储在输出文件夹中的多个文本文件中。通过提供包含 run_misty() 生成的结果的文件夹的路径，可以使用函数 collect_results() 收集、聚合和转换来自一个或多个样本的结果到 R 对象。

misty.results <- collect_results(result.folders)
#> 
#> Collecting improvements
#> 
#> Collecting contributions
#> 
#> Collecting importances
#> 
#> Aggregating

summary(misty.results)
#>                        Length Class  Mode
#> improvements           4      tbl_df list
#> contributions          4      tbl_df list
#> importances            5      tbl_df list
#> improvements.stats     5      tbl_df list
#> contributions.stats    6      tbl_df list
#> importances.aggregated 5      tbl_df list

See help(collect_results) for more information on the structure of misty.results.

6. 绘图

MISTy 可以用于解释如下三个general questions。每个问题都可以通过查看相应的plot来回答。

1. How much can the broader spatial context explain the expression of markers (in contrast to the intraview)?

这可以在使用多视图模型的 R2 增益（绝对百分比）（或减少 RMSE 的相对百分比）中观察到（与单个视图内模型相比）。

misty.results %>%
  plot_improvement_stats("gain.R2") %>%
  plot_improvement_stats("gain.RMSE")

我们可以通过基于交叉验证改进的assigned p-value进一步检查解释的方差增益的重要性。

misty.results$improvements %>%
  filter(measure == "p.R2") %>%
  group_by(target) %>% 
  summarize(mean.p = mean(value)) %>%
  arrange(mean.p)

In general, the significant gain in R2 can be interpreted as the following: “We can better explain the expression of marker X, when we consider additional views, other than the intrinsic view.”

2. How much do different view components contribute to explaining the expression?

misty.results %>% plot_view_contributions()

正如预期的那样，对标记表达预测的大部分贡献来自视图内。然而，对于我们观察到方差显著改善的marker，我们还可以观察到 paraview 的比例估计贡献。

3. What are the specific relations that can explain the contributions?

为了解释contributions，我们可以将来自每个视图的marker的重要性分别可视化为所有marker表达的预测因子。

First, the intraview importances.

misty.results %>% plot_interaction_heatmap(view = "intra", cutoff = 0.8)

这些importances与同一单元格中marker之间的关系有关。由于我们在建模过程中没有以任何方式使用有关细胞类型的信息，因此我们在热图中看到的重要相互作用可能来自任何细胞类型。

Second, the paraview importances.

misty.results %>% plot_interaction_heatmap(view = "para.10", cutoff = 0.5)

这些importances与细胞中的marker和更广泛结构中的marker之间的关系有关（由我们的参数 l 控制）。

我们可以观察到 paraview 中的一些交互可能是多余的，即它们在 intraview 中也很重要。为了只关注来自 paraview 的交互，我们可以绘制这些结果之间的对比。

misty.results %>% plot_contrast_heatmap("intra", "para.10", cutoff = 0.5)

此外，由于两个视图中的预测变量和目标marker相同，我们可以从intraview的estimated interaction pairs中提取interaction communities并绘图。

misty.results %>% plot_interaction_communities("intra")

and the paraview.

misty.results %>% plot_interaction_communities("para.10", cutoff = 0.5)

在解释结果和图时，重要的是要注意重要性中捕获的关系不是假设或解释为线性或随意的。此外，不应孤立地解释单个预测变量 - marker对的估计重要性，而应在其他预测变量的上下文中进行解释，因为训练 MISTy 模型是多变量预测任务。

MISTy：空间转录组细胞互作分析方法

1. 简介

2. 数据准备

3. View composition

4. Model training

5. Result processing

6. 绘图

1. How much can the broader spatial context explain the expression of markers (in contrast to the intraview)?

2. How much do different view components contribute to explaining the expression?

3. What are the specific relations that can explain the contributions?

更多例子

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读