1.利用箱线图比较两类样本的某个细胞比例差异

比较直观，但是缺点在于如果单细胞样本个数过少且异质性大，导致很难有统计学显著意义

library(ggpubr)
data <- data.frame(Cancer = c(0.5, 0.6, 0.8, 0.2),
                   Normal = c(0.2, 0.3, 0.7, 0.4),
                   Celltype = "T cells")
mydata <- reshape2::melt(data,id.vars=c("Celltype"))
ggboxplot(mydata, x = "Celltype", y = "value",
          color = "variable", palette = "jama",
          add = "jitter") + stat_compare_means(aes(color=variable))

2.R o/e 比值

好多文章都有用这个，我的理解是四格表卡方检验计算出来的观测除以期望

Cell_type	Cancer	Normal
Tcell	80	200
Bcell	100	120
Tam	200	100

例如上述数据，一开始有三类细胞，分别在癌和正常的个数如表所示，那么计算R
o/e 的时候就要构建四格表，以T细胞为例

Cell_type	Cancer	Normal
Tcell	80	200
Others	300	220

##计算卡方值以及期望和观测值
x <- chisq.test(matrix(c(80,300,200,220),ncol = 2))
Roe <- x$observed / x$expected
Roe

##           [,1]      [,2]
## [1,] 0.6015038 1.3605442
## [2,] 1.2145749 0.8058608

#p值
paste0("P-value = ",x$p.value)

## [1] "P-value = 6.55065992002061e-15"

可以看出Normal组Roe>1,说明T细胞比例在Normal组相对Cancer组会更多一些

3.OR指数

这个其实跟Roe是等同效果，只不过它是利用fisher检验来计算OR值

###T细胞在Cancer组OR值
fisher.test(matrix(c(80,300,200,220),ncol = 2))

## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(80, 300, 200, 220), ncol = 2)
## p-value = 2.184e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2117305 0.4052156
## sample estimates:
## odds ratio 
##  0.2938061

###T细胞在Normal组OR值
fisher.test(matrix(c(200,220,80,300),ncol = 2))

## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(200, 220, 80, 300), ncol = 2)
## p-value = 2.184e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.467822 4.722986
## sample estimates:
## odds ratio 
##   3.403605

终：写这个单纯记录一下过程，避免后面自己忘记了，仅为拙见