Basic usage
这里,我们使用cellassign包自带的示例数据简单演示其基本用法。
首先,加载相关的R包和示例数据:
# 加载相关R包
library(SingleCellExperiment)
library(cellassign)
# 加载示例数据
# We use an example SingleCellExperiment consisting of 200 genes and 500 cells:
data(example_sce)
# 查看sce对象
print(example_sce)
#> class: SingleCellExperiment
#> dim: 200 500
#> metadata(20): params params ... params params
#> assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
#> rownames(200): Gene1 Gene2 ... Gene199 Gene200
#> rowData names(6): Gene BaseGeneMean ... DEFacGroup1 DEFacGroup2
#> colnames(500): Cell1 Cell2 ... Cell499 Cell500
#> colData names(5): Cell Batch Group ExpLibSize EM_group
#> reducedDimNames(0):
#> spikeNames(0):
# The true cell types are annotated for convenience in the Group slot of the SingleCellExperiment:
# 查看细胞分群信息
print(head(example_sce$Group))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"
# Also provided is an example gene-by-cell-type binary matrix, whose entries are 1 if a gene is a marker for a given cell type and 0 otherwise:
# 加载示例marker基因二进制矩阵
data(example_marker_mat)
# 查看marker基因信息
print(example_marker_mat)
#> Group1 Group2
#> Gene1 1 0
#> Gene2 0 1
#> Gene3 1 0
#> Gene4 1 0
#> Gene5 1 0
#> Gene6 0 1
#> Gene7 0 1
#> Gene8 0 1
#> Gene9 0 1
#> Gene10 1 0
接下来,我们计算每个细胞的size factors,它们存储在sizeFactors(example_sce)中。我们建议使用scran
包中的computeSumFactors
函数对它们进行计算。
s <- sizeFactors(example_sce)
最后,我们调用cellassign()
函数进行细胞类型注释,使用基因表达矩阵、marker基因和size factors作为输入。
fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),],
marker_gene_info = example_marker_mat,
s = s,
learning_rate = 1e-2,
shrinkage = TRUE,
verbose = FALSE)
运行完cellassign函数后,会返回一个cellassign对象:
# 查看cellassign对象
print(fit)
#> A cellassign fit for 500 cells, 10 genes, 2 cell types with 0 covariates
#> To access cell types, call celltypes(x)
#> To access cell type probabilities, call cellprobs(x)
我们可以使用celltypes()
函数访问查看细胞类型的最大似然估计(MLE):
print(head(celltypes(fit)))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"
使用mleparams()
函数查看所有的MLE参数信息:
print(str(mleparams(fit)))
#> List of 9
#> $ delta : num [1:10, 1:2] 2.32 0 2.5 2.74 2.82 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:10] "Gene1" "Gene2" "Gene3" "Gene4" ...
#> .. ..$ : chr [1:2] "Group1" "Group2"
#> $ beta : num [1:10, 1] 0.487 -0.261 -1.013 1.328 1.537 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:10] "Gene1" "Gene2" "Gene3" "Gene4" ...
#> .. ..$ : NULL
#> $ phi : num [1:500, 1:10, 1:2] 1.81 1.81 1.8 1.8 1.8 ...
#> $ gamma : num [1:500, 1:2] 1.00 4.50e-147 8.21e-45 1.00 1.00 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "Group1" "Group2"
#> $ mu : num [1:500, 1:10, 1:2] 22.6 80.8 11.5 15.5 15.8 ...
#> $ a : num [1:10(1d)] 0.989 1.053 1.125 1.209 1.307 ...
#> $ theta : num [1:2(1d)] 0.472 0.528
#> ..- attr(*, "dimnames")=List of 1
#> .. ..$ : chr [1:2] "Group1" "Group2"
#> $ ld_mean: num 1
#> $ ld_var : num 0.785
#> NULL
我们还可以使用cellprobs()
函数获取每种细胞类型分配的概率,该函数返回每个细胞和细胞类型的概率矩阵:
pheatmap::pheatmap(cellprobs(fit))
image.png
Finally, since this is simulated data we can check the concordance with the true group values:
print(table(example_sce$Group, celltypes(fit)))
#>
#> Group1 Group2
#> Group1 236 0
#> Group2 0 264
Example set of markers for tumour microenvironment
cellassign
包提供了一组示例marker标记,用于区分人类肿瘤微环境中的常见细胞类型。cellassign的工作流程通常是迭代的,包括确保所有的marker基因均在你的scRNA-seq表达数据中表达,并从输入marker基因矩阵中删除似乎不存在的细胞类型。
该组示例marker标记基因可用于以下细胞类型:
- B cells
- T cells
- Cytotoxic T cells
- Monocyte/Macrophage
- Epithelial cells
- Myofibroblasts
- Vascular smooth muscle cells
- Endothelial cells
加载示例marker标记基因
data(example_TME_markers)
# Note that this is a list of two marker lists:
names(example_TME_markers)
#> [1] "symbol" "ensembl"
# Where symbol contains gene symbols:
# 查看marker标记基因信息
lapply(head(example_TME_markers$symbol, n = 4), head, n = 4)
#> $`B cells`
#> [1] "VIM" "MS4A1" "CD79A" "PTPRC"
#>
#> $`T cells`
#> [1] "VIM" "CD2" "CD3D" "CD3E"
#>
#> $`Cytotoxic T cells`
#> [1] "VIM" "CD2" "CD3D" "CD3E"
#>
#> $`Monocyte/Macrophage`
#> [1] "VIM" "CD14" "FCGR3A" "CD33"
# and ensembl contains the equivalent ensembl gene ids:
lapply(head(example_TME_markers$ensembl, n = 4), head, n = 4)
#> $`B cells`
#> VIM MS4A1 CD79A PTPRC
#> "ENSG00000026025" "ENSG00000156738" "ENSG00000105369" "ENSG00000081237"
#>
#> $`T cells`
#> VIM CD2 CD3D CD3E
#> "ENSG00000026025" "ENSG00000116824" "ENSG00000167286" "ENSG00000198851"
#>
#> $`Cytotoxic T cells`
#> VIM CD2 CD3D CD3E
#> "ENSG00000026025" "ENSG00000116824" "ENSG00000167286" "ENSG00000198851"
#>
#> $`Monocyte/Macrophage`
#> VIM CD14 FCGR3A CD33
#> "ENSG00000026025" "ENSG00000170458" "ENSG00000203747" "ENSG00000105383"
To use these with cellassign we can turn them into the binary marker by cell type matrix
:
# 使用marker_list_to_mat函数转换为二进制的marker基因矩阵
marker_mat <- marker_list_to_mat(example_TME_markers$ensembl)
# 查看marker基因矩阵
marker_mat[1:3, 1:3]
#> B cells T cells Cytotoxic T cells
#> ENSG00000010610 0 1 0
#> ENSG00000026025 1 1 1
#> ENSG00000039068 0 0 0
Important: the single cell experiment or input gene expression matrix
should be subset accordingly to match the rows of the marker input matrix
, e.g. if sce is a SingleCellExperiment with ensembl IDs as rownames then call
sce_marker <- sce[intersect(rownames(marker_mat), rownames(sce)),]
请注意,单细胞scRNA-seq数据的基因表达矩阵中的行应与marker标记基因输入矩阵中行的顺序相同。
Advanced usage
Options for a cellassign() call
我们可以调整一下参数来改变cellassign()函数的结果:
-
min_delta
: the minimum log-fold change in expression above which a gene must be over-expressed in the cells of which it is a marker compared to all others -
X
: a covariate matrix, see section below -
shrinkage
: whether to impose a hierarchical prior on the values of delta (cell type specific increase in expression of marker genes)
Constructing a marker gene matrix
在这里,我们演示了一种构建二进制marker基因矩阵(binary marker gene matrix)的方法,该marker基因矩阵提供了我们对细胞类型的先验知识。
对于两种类型的细胞(Group1 and Group2),我们事先知道几个良好的marker基因,例如:
image.png
要在cellassign中使用它,我们必须将其变成一个命名列表,其中名称是细胞类型,而条目是每种细胞类型的marker基因(不一定是互斥的):
marker_gene_list <- list(
Group1 = c("Gene186", "Gene269", "Gene526", "Gene536", "Gene994"),
Group2 = c("Gene205", "Gene575", "Gene754", "Gene773", "Gene949")
)
print(str(marker_gene_list))
#> List of 2
#> $ Group1: chr [1:5] "Gene186" "Gene269" "Gene526" "Gene536" ...
#> $ Group2: chr [1:5] "Gene205" "Gene575" "Gene754" "Gene773" ...
#> NULL
然后,我们可以直接使用marker_list_to_mat()
函数将其直接提供给cellassign或将其转换为二进制marker基因矩阵:
print(marker_list_to_mat(marker_gene_list))
#> Group1 Group2 other
#> Gene186 1 0 0
#> Gene205 0 1 0
#> Gene269 1 0 0
#> Gene526 1 0 0
#> Gene536 1 0 0
#> Gene575 0 1 0
#> Gene754 0 1 0
#> Gene773 0 1 0
#> Gene949 0 1 0
#> Gene994 1 0 0
对于不属于任何一种细胞类型的细胞,它会自动将其归为“other”组。当然,我们也可以通过设置include_other = FALSE
将其排除。
Adding covariates
我们还可以将批次、样本或患者特异性效应的协变量添加到cellassign模型中。例如,如果我们有两个协变量x1和x2:
N <- ncol(example_sce)
x1 <- rnorm(N)
x2 <- rnorm(N)
We can construct a design matrix
using the model.matrix
function in R:
X <- model.matrix(~ 0 + x1 + x2)
Note we explicitly set no intercept by passing in 0 in the beginning. We can then perform an equivalent cell assignment passing this in also:
fit <- cellassign(exprs_obj = example_sce,
marker_gene_info = example_marker_mat,
X = X,
s = s,
learning_rate = 1e-2,
shrinkage = TRUE,
verbose = FALSE)
参考来源:https://irrationone.github.io/cellassign/articles/introduction-to-cellassign.html
网友评论