Overview of CellMix
CellMix is a R package released in 2013, which integrate 7 gene expression deconvolution algorithms, 8 marker gene lists, 11 public datasets, and facilitates the estimation of cell type proportions and/or cell-specific differential expression in gene expression experiments.
Background and objectives
Gene expression deconvolution is naturally expressed as matrix decomposition problem.It use global gene expression data including supervised and unsupervised methods, for supervised deconvolution it need to combine with known signatures or marker genes.
- cell proportions: what is the proportion of each cell type?
- are there differences in proportion that are associated with some disease status/covariate?
- cell-specific expression: what is the expression profile of each cell type?
- which genes are differently expressed in each cell types between groups of samples?
Install CellMix package
biocLite("CellMix", siteRepos ="http://web.cbio.uct.ac.za/~renaud/CRAN")
Estimating cell proportions from known signatures
Blood samples
- known cell-specific expression signatures generated in independent studies (Abbas et al. 2009; Gong et al. 2011).
- tutorial use the dataset GSE20300
# load data (normally requires an internet connection to GEO)
acr <- ExpressionMix("GSE20300", verbose = 2)
# estimate proportions using signatures from Abbas et al. (2009)
res <- gedBlood(acr, verbose = TRUE)
# proportions are stored in the coefficient matrix
coef(res)[1:3, 1:4]
# cell type names
# basis signatures (with converted IDs)
basis(res)[1:5, 1:3]
# aggregate into CBC
cbc <- asCBC(res) dim(cbc)
# plot against actual CBC
profplot(acr, cbc)
# plot cell proportion differences between groups
boxplotBy(res, acr$Status, main = "Cell proportions vs Transplant status")
Building/filtering basis signatures
select genes based on their cell type specificity, and build a basis
signature matrix that provides the “maximum” deconvolution power.
From marker genes only
no pure sample expression profile,deconvolution and esti-
mation of cell type proportions can be still be performed, using sets of marker genes.
# check if data is in log scale
# compute mean expression profiles within each cell type p <- ged(expb(mix, 2), sel, "meanProfile")
# plot against known proportions (p is by default not scaled)
profplot(mix, p, scale = TRUE, main = "meanProfile - Linear scale")
# compute mean expression profiles within each cell type
lp <- ged(mix, sel, "meanProfile")
# plot against known proportions (p is by default not scaled)
profplot(mix, lp, scale = TRUE, main = "meanProfile - Log scale")
# compute proportions using DSA methods
pdsa <- ged(mix[sel], sel, "DSA", verbose = TRUE)
profplot(mix, pdsa, main = "DSA - Linear scale")
pdsa <- ged(mix[sel], sel, "DSA", log = FALSE)
profplot(mix, pdsa, main = "DSA - Log scale")
Estimating differential cell-specific expression
- From measured proportions: csSAM
- From proportion priors: DSection
Complete deconvolution using marker genes
A priori: enforce marker expression patterns
# generate random data with 5 markers per cell type
x <- rmix(3, 200, 20, markers = 5)
m <- getMarkers(x)
# deconvolve using KL-divergence metric
kl <- ged(x, m, "ssKL", log = FALSE, rng = 1234, nrun = 10)
# plot against known proportions
profplot(x, kl)
# check consistency of most expressing cell types in known basis signatures
basismarkermap(basis(x), kl)
# correlation with known signatures
basiscor(x, kl)
A posteriori: assign signatures to cell types
# deconvolve using KL divergence metric
dec <- ged(x, m, "deconf", rng = 1234, nrun = 10)
# plot against known proportions
profplot(x, dec)
# check consistency of most expressing cell types in known signatures basismarkermap(basis(x), dec)
# correlation with known signatures
basiscor(x, dec)