2020-06-07 单细胞测序UMI counts的分布问题

作者: 王子威PtaYoth | 来源:发表于2020-06-07 20:39 被阅读0次

2020-06-07 单细胞测序UMI counts的分布问题
cellranger实战1：非正常数据之改header
单细胞3端和5端测序的区别
空间转录组测序“黑幕”爆料之实验室篇
单细胞测序技术
第05周-空间单细胞DNA测序
NGS原理- 单细胞转录组测序-横评 CEL-seq2, Dro
一文看懂植物单细胞测序怎么做？
纯生信分析套路最靓的单细胞研究
汇总-单细胞测序技术(single cell sequencin

monocle2在处理单细胞测序数据matrix时，要求指定数据的分布类型
包括负二项分布，泊松分布（不推荐），log-高斯分布
后来又看到zero-inflation negative binomial（ZINB）模型，考虑了UMI实验中大量的0值
具体可参考这篇文章：
https://zhuanlan.zhihu.com/p/95299303

后来看到Genome Biology这篇文章，看了个摘要：

Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.

作者认为不存在什么零膨胀，现在的normalization会导致错误选择HVGs，从而导致错误的降维，采用了GLM-PCA算法对非正态分布的数据进行降维。
算法本身千变万化已经不想太关注了，之前也曾尝试各种fancy的方法，但是最终结果大方向都是一致的。
看这篇文章是想关注一下所谓的UMI count 实验的数据分布问题。

2018年 Genome Biology提出用negative binomial model with independent dispersions 拟合UMI counts数据

Read counting and unique molecular identifier (UMI) counting are the principal gene expression quantification schemes used in single-cell RNA-sequencing (scRNA-seq) analysis. By using multiple scRNA-seq datasets, we reveal distinct distribution differences between these schemes and conclude that the negative binomial model is a good approximation for UMI counts, even in heterogeneous populations. We further propose a novel differential expression analysis algorithm based on a negative binomial model with independent dispersions in each group (NBID). Our results show that this properly controls the FDR and achieves better power for UMI counts when compared to other recently developed packages for scRNA-seq analysis.