使用cellassign包进行单细胞类型注释分析（二）：Assi

作者: Davey1220 | 来源:发表于2020-05-23 16:00 被阅读0次

使用cellassign包进行单细胞类型注释分析（二）：Assi
单细胞转录组细胞类型注释分析合集
使用cellassign包进行单细胞类型注释分析（一）：Cons
使用CHETAH包进行单细胞类型注释分析
使用SingleR包进行单细胞类型注释分析
使用Garnett包进行单细胞类型分类注释分析（二）：Class
使用Celaref包进行单细胞类型注释分析（二）：Using t
使用Celaref包进行单细胞类型注释分析（一）：Celaref
使用Celaref包进行单细胞类型注释分析（三）：Example
SingleR如何使用自定义的参考集

image.png

Basic usage

这里，我们使用cellassign包自带的示例数据简单演示其基本用法。
首先，加载相关的R包和示例数据：

# 加载相关R包
library(SingleCellExperiment)
library(cellassign)

# 加载示例数据
# We use an example SingleCellExperiment consisting of 200 genes and 500 cells:
data(example_sce)

# 查看sce对象
print(example_sce)
#> class: SingleCellExperiment 
#> dim: 200 500 
#> metadata(20): params params ... params params
#> assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
#> rownames(200): Gene1 Gene2 ... Gene199 Gene200
#> rowData names(6): Gene BaseGeneMean ... DEFacGroup1 DEFacGroup2
#> colnames(500): Cell1 Cell2 ... Cell499 Cell500
#> colData names(5): Cell Batch Group ExpLibSize EM_group
#> reducedDimNames(0):
#> spikeNames(0):

# The true cell types are annotated for convenience in the Group slot of the SingleCellExperiment:
# 查看细胞分群信息
print(head(example_sce$Group))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"

# Also provided is an example gene-by-cell-type binary matrix, whose entries are 1 if a gene is a marker for a given cell type and 0 otherwise:
# 加载示例marker基因二进制矩阵
data(example_marker_mat)
# 查看marker基因信息
print(example_marker_mat)
#>        Group1 Group2
#> Gene1       1      0
#> Gene2       0      1
#> Gene3       1      0
#> Gene4       1      0
#> Gene5       1      0
#> Gene6       0      1
#> Gene7       0      1
#> Gene8       0      1
#> Gene9       0      1
#> Gene10      1      0

接下来，我们计算每个细胞的size factors，它们存储在sizeFactors(example_sce)中。我们建议使用scran包中的computeSumFactors函数对它们进行计算。

s <- sizeFactors(example_sce)

最后，我们调用cellassign()函数进行细胞类型注释，使用基因表达矩阵、marker基因和size factors作为输入。

fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),], 
                  marker_gene_info = example_marker_mat, 
                  s = s, 
                  learning_rate = 1e-2, 
                  shrinkage = TRUE,
                  verbose = FALSE)

运行完cellassign函数后，会返回一个cellassign对象：

# 查看cellassign对象
print(fit)
#> A cellassign fit for 500 cells, 10 genes, 2 cell types with 0 covariates
#>             To access cell types, call celltypes(x)
#>             To access cell type probabilities, call cellprobs(x)

我们可以使用celltypes()函数访问查看细胞类型的最大似然估计（MLE）：

print(head(celltypes(fit)))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"

使用mleparams()函数查看所有的MLE参数信息：

print(str(mleparams(fit)))
#> List of 9
#>  $ delta  : num [1:10, 1:2] 2.32 0 2.5 2.74 2.82 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:10] "Gene1" "Gene2" "Gene3" "Gene4" ...
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ beta   : num [1:10, 1] 0.487 -0.261 -1.013 1.328 1.537 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:10] "Gene1" "Gene2" "Gene3" "Gene4" ...
#>   .. ..$ : NULL
#>  $ phi    : num [1:500, 1:10, 1:2] 1.81 1.81 1.8 1.8 1.8 ...
#>  $ gamma  : num [1:500, 1:2] 1.00 4.50e-147 8.21e-45 1.00 1.00 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ mu     : num [1:500, 1:10, 1:2] 22.6 80.8 11.5 15.5 15.8 ...
#>  $ a      : num [1:10(1d)] 0.989 1.053 1.125 1.209 1.307 ...
#>  $ theta  : num [1:2(1d)] 0.472 0.528
#>   ..- attr(*, "dimnames")=List of 1
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ ld_mean: num 1
#>  $ ld_var : num 0.785
#> NULL

我们还可以使用cellprobs()函数获取每种细胞类型分配的概率，该函数返回每个细胞和细胞类型的概率矩阵：

pheatmap::pheatmap(cellprobs(fit))

image.png

Finally, since this is simulated data we can check the concordance with the true group values:

print(table(example_sce$Group, celltypes(fit)))
#>         
#>          Group1 Group2
#>   Group1    236      0
#>   Group2      0    264

Example set of markers for tumour microenvironment

cellassign包提供了一组示例marker标记，用于区分人类肿瘤微环境中的常见细胞类型。cellassign的工作流程通常是迭代的，包括确保所有的marker基因均在你的scRNA-seq表达数据中表达，并从输入marker基因矩阵中删除似乎不存在的细胞类型。

该组示例marker标记基因可用于以下细胞类型：

B cells
T cells
Cytotoxic T cells
Monocyte/Macrophage
Epithelial cells
Myofibroblasts
Vascular smooth muscle cells
Endothelial cells

加载示例marker标记基因

data(example_TME_markers)

# Note that this is a list of two marker lists:
names(example_TME_markers)
#> [1] "symbol"  "ensembl"

# Where symbol contains gene symbols:
# 查看marker标记基因信息
lapply(head(example_TME_markers$symbol, n = 4), head, n = 4)
#> $`B cells`
#> [1] "VIM"   "MS4A1" "CD79A" "PTPRC"
#> 
#> $`T cells`
#> [1] "VIM"  "CD2"  "CD3D" "CD3E"
#> 
#> $`Cytotoxic T cells`
#> [1] "VIM"  "CD2"  "CD3D" "CD3E"
#> 
#> $`Monocyte/Macrophage`
#> [1] "VIM"    "CD14"   "FCGR3A" "CD33"

# and ensembl contains the equivalent ensembl gene ids:
lapply(head(example_TME_markers$ensembl, n = 4), head, n = 4)
#> $`B cells`
#>               VIM             MS4A1             CD79A             PTPRC 
#> "ENSG00000026025" "ENSG00000156738" "ENSG00000105369" "ENSG00000081237" 
#> 
#> $`T cells`
#>               VIM               CD2              CD3D              CD3E 
#> "ENSG00000026025" "ENSG00000116824" "ENSG00000167286" "ENSG00000198851" 
#> 
#> $`Cytotoxic T cells`
#>               VIM               CD2              CD3D              CD3E 
#> "ENSG00000026025" "ENSG00000116824" "ENSG00000167286" "ENSG00000198851" 
#> 
#> $`Monocyte/Macrophage`
#>               VIM              CD14            FCGR3A              CD33 
#> "ENSG00000026025" "ENSG00000170458" "ENSG00000203747" "ENSG00000105383"

To use these with cellassign we can turn them into the binary marker by cell type matrix:

# 使用marker_list_to_mat函数转换为二进制的marker基因矩阵
marker_mat <- marker_list_to_mat(example_TME_markers$ensembl)

# 查看marker基因矩阵
marker_mat[1:3, 1:3]
#>                 B cells T cells Cytotoxic T cells
#> ENSG00000010610       0       1                 0
#> ENSG00000026025       1       1                 1
#> ENSG00000039068       0       0                 0

Important: the single cell experiment or input gene expression matrix should be subset accordingly to match the rows of the marker input matrix, e.g. if sce is a SingleCellExperiment with ensembl IDs as rownames then call

sce_marker <- sce[intersect(rownames(marker_mat), rownames(sce)),]

请注意，单细胞scRNA-seq数据的基因表达矩阵中的行应与marker标记基因输入矩阵中行的顺序相同。

Advanced usage

Options for a cellassign() call

我们可以调整一下参数来改变cellassign()函数的结果：

min_delta: the minimum log-fold change in expression above which a gene must be over-expressed in the cells of which it is a marker compared to all others
X: a covariate matrix, see section below
shrinkage: whether to impose a hierarchical prior on the values of delta (cell type specific increase in expression of marker genes)

Constructing a marker gene matrix

在这里，我们演示了一种构建二进制marker基因矩阵（binary marker gene matrix）的方法，该marker基因矩阵提供了我们对细胞类型的先验知识。

对于两种类型的细胞（Group1 and Group2），我们事先知道几个良好的marker基因，例如：

image.png

要在cellassign中使用它，我们必须将其变成一个命名列表，其中名称是细胞类型，而条目是每种细胞类型的marker基因（不一定是互斥的）：

marker_gene_list <- list(
  Group1 = c("Gene186", "Gene269", "Gene526", "Gene536", "Gene994"),
  Group2 = c("Gene205", "Gene575", "Gene754", "Gene773", "Gene949")
)

print(str(marker_gene_list))
#> List of 2
#>  $ Group1: chr [1:5] "Gene186" "Gene269" "Gene526" "Gene536" ...
#>  $ Group2: chr [1:5] "Gene205" "Gene575" "Gene754" "Gene773" ...
#> NULL

然后，我们可以直接使用marker_list_to_mat()函数将其直接提供给cellassign或将其转换为二进制marker基因矩阵：

print(marker_list_to_mat(marker_gene_list))
#>         Group1 Group2 other
#> Gene186      1      0     0
#> Gene205      0      1     0
#> Gene269      1      0     0
#> Gene526      1      0     0
#> Gene536      1      0     0
#> Gene575      0      1     0
#> Gene754      0      1     0
#> Gene773      0      1     0
#> Gene949      0      1     0
#> Gene994      1      0     0

对于不属于任何一种细胞类型的细胞，它会自动将其归为“other”组。当然，我们也可以通过设置include_other = FALSE将其排除。

Adding covariates

我们还可以将批次、样本或患者特异性效应的协变量添加到cellassign模型中。例如，如果我们有两个协变量x1和x2：

N <- ncol(example_sce)
x1 <- rnorm(N)
x2 <- rnorm(N)

We can construct a design matrix using the model.matrix function in R:

X <- model.matrix(~ 0 + x1 + x2)

Note we explicitly set no intercept by passing in 0 in the beginning. We can then perform an equivalent cell assignment passing this in also:

fit <- cellassign(exprs_obj = example_sce, 
                  marker_gene_info = example_marker_mat, 
                  X = X,
                  s = s, 
                  learning_rate = 1e-2, 
                  shrinkage = TRUE,
                  verbose = FALSE)

参考来源：https://irrationone.github.io/cellassign/articles/introduction-to-cellassign.html

使用cellassign包进行单细胞类型注释分析（二）：Assi
Basic usage 这里，我们使用cellassign包自带的示例数据简单演示其基本用法。首先，加载相关的R包...
单细胞转录组细胞类型注释分析合集
SingleR 使用SingleR包进行单细胞类型注释分析 Celaref 使用Celaref包进行单细胞类型注释...
使用cellassign包进行单细胞类型注释分析（一）：Cons
cellassign包简介 cellassign基于marker基因的信息自动将单细胞RNA-seq数据分配注释到...
使用CHETAH包进行单细胞类型注释分析
CHETAH包简介 CHETAH（CHaracterization of cEll Types Aided by ...
使用SingleR包进行单细胞类型注释分析
SingleR包简介 SingleR是一个用于对单细胞RNA-seq测序（scRNA-seq）数据进行细胞类型自动...
使用Garnett包进行单细胞类型分类注释分析（二）：Class
image.png Classifying your cells Load your data 构建好细胞分类器之...
使用Celaref包进行单细胞类型注释分析（二）：Using t
Prepare data Celaref包以SummarizedExperiment对象的数据作为输入，我们可以使...
使用Celaref包进行单细胞类型注释分析（一）：Celaref
Celaref包简介单细胞RNA-seq测序（scRNA-seq）技术可以解析组织或样品内的细胞异质性，并查看特...
使用Celaref包进行单细胞类型注释分析（三）：Example
PBMCs - 10X vs Microarray Reference 血液中的PBMCs是一种易于获取的异质细胞...
SingleR如何使用自定义的参考集
在我之前的帖子单细胞分析实录(7): 差异表达分析/细胞类型注释里面，我已经介绍了如何使用SingleR给单细胞数...