美文网首页单细胞分析单细胞转录组分析
使用cellassign包进行单细胞类型注释分析(二):Assi

使用cellassign包进行单细胞类型注释分析(二):Assi

作者: Davey1220 | 来源:发表于2020-05-23 16:00 被阅读0次
image.png

Basic usage

这里,我们使用cellassign包自带的示例数据简单演示其基本用法。
首先,加载相关的R包和示例数据:

# 加载相关R包
library(SingleCellExperiment)
library(cellassign)

# 加载示例数据
# We use an example SingleCellExperiment consisting of 200 genes and 500 cells:
data(example_sce)

# 查看sce对象
print(example_sce)
#> class: SingleCellExperiment 
#> dim: 200 500 
#> metadata(20): params params ... params params
#> assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
#> rownames(200): Gene1 Gene2 ... Gene199 Gene200
#> rowData names(6): Gene BaseGeneMean ... DEFacGroup1 DEFacGroup2
#> colnames(500): Cell1 Cell2 ... Cell499 Cell500
#> colData names(5): Cell Batch Group ExpLibSize EM_group
#> reducedDimNames(0):
#> spikeNames(0):

# The true cell types are annotated for convenience in the Group slot of the SingleCellExperiment:
# 查看细胞分群信息
print(head(example_sce$Group))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"

# Also provided is an example gene-by-cell-type binary matrix, whose entries are 1 if a gene is a marker for a given cell type and 0 otherwise:
# 加载示例marker基因二进制矩阵
data(example_marker_mat)
# 查看marker基因信息
print(example_marker_mat)
#>        Group1 Group2
#> Gene1       1      0
#> Gene2       0      1
#> Gene3       1      0
#> Gene4       1      0
#> Gene5       1      0
#> Gene6       0      1
#> Gene7       0      1
#> Gene8       0      1
#> Gene9       0      1
#> Gene10      1      0

接下来,我们计算每个细胞的size factors,它们存储在sizeFactors(example_sce)中。我们建议使用scran包中的computeSumFactors函数对它们进行计算。

s <- sizeFactors(example_sce)

最后,我们调用cellassign()函数进行细胞类型注释,使用基因表达矩阵、marker基因和size factors作为输入。

fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),], 
                  marker_gene_info = example_marker_mat, 
                  s = s, 
                  learning_rate = 1e-2, 
                  shrinkage = TRUE,
                  verbose = FALSE)

运行完cellassign函数后,会返回一个cellassign对象:

# 查看cellassign对象
print(fit)
#> A cellassign fit for 500 cells, 10 genes, 2 cell types with 0 covariates
#>             To access cell types, call celltypes(x)
#>             To access cell type probabilities, call cellprobs(x)

我们可以使用celltypes()函数访问查看细胞类型的最大似然估计(MLE):

print(head(celltypes(fit)))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"

使用mleparams()函数查看所有的MLE参数信息:

print(str(mleparams(fit)))
#> List of 9
#>  $ delta  : num [1:10, 1:2] 2.32 0 2.5 2.74 2.82 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:10] "Gene1" "Gene2" "Gene3" "Gene4" ...
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ beta   : num [1:10, 1] 0.487 -0.261 -1.013 1.328 1.537 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:10] "Gene1" "Gene2" "Gene3" "Gene4" ...
#>   .. ..$ : NULL
#>  $ phi    : num [1:500, 1:10, 1:2] 1.81 1.81 1.8 1.8 1.8 ...
#>  $ gamma  : num [1:500, 1:2] 1.00 4.50e-147 8.21e-45 1.00 1.00 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ mu     : num [1:500, 1:10, 1:2] 22.6 80.8 11.5 15.5 15.8 ...
#>  $ a      : num [1:10(1d)] 0.989 1.053 1.125 1.209 1.307 ...
#>  $ theta  : num [1:2(1d)] 0.472 0.528
#>   ..- attr(*, "dimnames")=List of 1
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ ld_mean: num 1
#>  $ ld_var : num 0.785
#> NULL

我们还可以使用cellprobs()函数获取每种细胞类型分配的概率,该函数返回每个细胞和细胞类型的概率矩阵:

pheatmap::pheatmap(cellprobs(fit))
image.png

Finally, since this is simulated data we can check the concordance with the true group values:

print(table(example_sce$Group, celltypes(fit)))
#>         
#>          Group1 Group2
#>   Group1    236      0
#>   Group2      0    264

Example set of markers for tumour microenvironment

cellassign包提供了一组示例marker标记,用于区分人类肿瘤微环境中的常见细胞类型。cellassign的工作流程通常是迭代的,包括确保所有的marker基因均在你的scRNA-seq表达数据中表达,并从输入marker基因矩阵中删除似乎不存在的细胞类型。

该组示例marker标记基因可用于以下细胞类型:

  • B cells
  • T cells
  • Cytotoxic T cells
  • Monocyte/Macrophage
  • Epithelial cells
  • Myofibroblasts
  • Vascular smooth muscle cells
  • Endothelial cells

加载示例marker标记基因

data(example_TME_markers)

# Note that this is a list of two marker lists:
names(example_TME_markers)
#> [1] "symbol"  "ensembl"

# Where symbol contains gene symbols:
# 查看marker标记基因信息
lapply(head(example_TME_markers$symbol, n = 4), head, n = 4)
#> $`B cells`
#> [1] "VIM"   "MS4A1" "CD79A" "PTPRC"
#> 
#> $`T cells`
#> [1] "VIM"  "CD2"  "CD3D" "CD3E"
#> 
#> $`Cytotoxic T cells`
#> [1] "VIM"  "CD2"  "CD3D" "CD3E"
#> 
#> $`Monocyte/Macrophage`
#> [1] "VIM"    "CD14"   "FCGR3A" "CD33"

# and ensembl contains the equivalent ensembl gene ids:
lapply(head(example_TME_markers$ensembl, n = 4), head, n = 4)
#> $`B cells`
#>               VIM             MS4A1             CD79A             PTPRC 
#> "ENSG00000026025" "ENSG00000156738" "ENSG00000105369" "ENSG00000081237" 
#> 
#> $`T cells`
#>               VIM               CD2              CD3D              CD3E 
#> "ENSG00000026025" "ENSG00000116824" "ENSG00000167286" "ENSG00000198851" 
#> 
#> $`Cytotoxic T cells`
#>               VIM               CD2              CD3D              CD3E 
#> "ENSG00000026025" "ENSG00000116824" "ENSG00000167286" "ENSG00000198851" 
#> 
#> $`Monocyte/Macrophage`
#>               VIM              CD14            FCGR3A              CD33 
#> "ENSG00000026025" "ENSG00000170458" "ENSG00000203747" "ENSG00000105383"

To use these with cellassign we can turn them into the binary marker by cell type matrix:

# 使用marker_list_to_mat函数转换为二进制的marker基因矩阵
marker_mat <- marker_list_to_mat(example_TME_markers$ensembl)

# 查看marker基因矩阵
marker_mat[1:3, 1:3]
#>                 B cells T cells Cytotoxic T cells
#> ENSG00000010610       0       1                 0
#> ENSG00000026025       1       1                 1
#> ENSG00000039068       0       0                 0

Important: the single cell experiment or input gene expression matrix should be subset accordingly to match the rows of the marker input matrix, e.g. if sce is a SingleCellExperiment with ensembl IDs as rownames then call

sce_marker <- sce[intersect(rownames(marker_mat), rownames(sce)),]

请注意,单细胞scRNA-seq数据的基因表达矩阵中的行应与marker标记基因输入矩阵中行的顺序相同。

Advanced usage

Options for a cellassign() call

我们可以调整一下参数来改变cellassign()函数的结果:

  • min_delta: the minimum log-fold change in expression above which a gene must be over-expressed in the cells of which it is a marker compared to all others
  • X: a covariate matrix, see section below
  • shrinkage: whether to impose a hierarchical prior on the values of delta (cell type specific increase in expression of marker genes)

Constructing a marker gene matrix

在这里,我们演示了一种构建二进制marker基因矩阵(binary marker gene matrix)的方法,该marker基因矩阵提供了我们对细胞类型的先验知识。

对于两种类型的细胞(Group1 and Group2),我们事先知道几个良好的marker基因,例如:


image.png

要在cellassign中使用它,我们必须将其变成一个命名列表,其中名称是细胞类型,而条目是每种细胞类型的marker基因(不一定是互斥的):

marker_gene_list <- list(
  Group1 = c("Gene186", "Gene269", "Gene526", "Gene536", "Gene994"),
  Group2 = c("Gene205", "Gene575", "Gene754", "Gene773", "Gene949")
)

print(str(marker_gene_list))
#> List of 2
#>  $ Group1: chr [1:5] "Gene186" "Gene269" "Gene526" "Gene536" ...
#>  $ Group2: chr [1:5] "Gene205" "Gene575" "Gene754" "Gene773" ...
#> NULL

然后,我们可以直接使用marker_list_to_mat()函数将其直接提供给cellassign或将其转换为二进制marker基因矩阵:

print(marker_list_to_mat(marker_gene_list))
#>         Group1 Group2 other
#> Gene186      1      0     0
#> Gene205      0      1     0
#> Gene269      1      0     0
#> Gene526      1      0     0
#> Gene536      1      0     0
#> Gene575      0      1     0
#> Gene754      0      1     0
#> Gene773      0      1     0
#> Gene949      0      1     0
#> Gene994      1      0     0

对于不属于任何一种细胞类型的细胞,它会自动将其归为“other”组。当然,我们也可以通过设置include_other = FALSE将其排除。

Adding covariates

我们还可以将批次、样本或患者特异性效应的协变量添加到cellassign模型中。例如,如果我们有两个协变量x1和x2:

N <- ncol(example_sce)
x1 <- rnorm(N)
x2 <- rnorm(N)

We can construct a design matrix using the model.matrix function in R:

X <- model.matrix(~ 0 + x1 + x2)

Note we explicitly set no intercept by passing in 0 in the beginning. We can then perform an equivalent cell assignment passing this in also:

fit <- cellassign(exprs_obj = example_sce, 
                  marker_gene_info = example_marker_mat, 
                  X = X,
                  s = s, 
                  learning_rate = 1e-2, 
                  shrinkage = TRUE,
                  verbose = FALSE)

参考来源:https://irrationone.github.io/cellassign/articles/introduction-to-cellassign.html

相关文章

网友评论

    本文标题:使用cellassign包进行单细胞类型注释分析(二):Assi

    本文链接:https://www.haomeiwen.com/subject/twekahtx.html