因为P值的阈值是人为规定的,无论是多小的P值,也仅仅能代表结果的低假阳性,而非保证结果为真。如果检验一次,犯错的概率是5%;检测10000次,犯错的次数就是500次,即额外多出了500次差异的结论(即使实际没有差异)。即使P值已经很小(比如0.05),也会被检验的总次数无限放大。比如检验10000次,得到假阳性结果的次数就会达到 5%*10000=500次。
这时候我们就需要引入多重检验来进行校正,从而减低假阳性结果在我们的检验中出现的次数。
R语言
> p.adjust(p, method = p.adjust.methods, n = length(p))
> p.adjust
function (p, method = p.adjust.methods, n = length(p)){
method <- match.arg(method)
if (method == "fdr")
method <- "BH"
nm <- names(p)
p <- as.numeric(p)
……
BH = {
i <- lp:1L
o <- order(p, decreasing = TRUE)
ro <- order(o)
pmin(1, cummin(n/i * p[o]))[ro]
}
……
p0
}
- 我们将一系列p值、校正方法(BH)以及所有p值的个数(length(p))输入到p.adjust函数中。
- 将一系列的p值按照从大到小排序,然后利用下述公式计算每个p值所对应的FDR值。
公式:p * (n/i), p是这一次检验的p value,n是检验的次数,i是排序后的位置ID(如最大的P值的i值肯定为1,第二大则是2,依次至最小为n)。 - 将计算出来的FDR值赋予给排序后的p值,如果某一个p值所对应的FDR值大于前一位p值(排序的前一位)所对应的FDR值,则放弃公式计算出来的FDR值,选用与它前一位相同的值。因此会产生连续相同FDR值的现象;反之则保留计算的FDR值。
- 将FDR值按照最初始的p值的顺序进行重新排序,返回结果。
python
Signature:
multi.multipletests(
pvals,
alpha=0.05,
method='hs',
is_sorted=False,
returnsorted=False,
)
Docstring:
Test results and p-value correction for multiple tests
Parameters
----------
pvals : array_like, 1-d
uncorrected p-values. Must be 1-dimensional.
alpha : float
FWER, family-wise error rate, e.g. 0.1
method : str
Method used for testing and adjustment of pvalues. Can be either the
full name or initial letters. Available methods are:
- `bonferroni` : one-step correction
- `sidak` : one-step correction
- `holm-sidak` : step down method using Sidak adjustments
- `holm` : step-down method using Bonferroni adjustments
- `simes-hochberg` : step-up method (independent)
- `hommel` : closed method based on Simes tests (non-negative)
- `fdr_bh` : Benjamini/Hochberg (non-negative)
- `fdr_by` : Benjamini/Yekutieli (negative)
- `fdr_tsbh` : two stage fdr correction (non-negative)
- `fdr_tsbky` : two stage fdr correction (non-negative)
is_sorted : bool
If False (default), the p_values will be sorted, but the corrected
pvalues are in the original order. If True, then it assumed that the
pvalues are already sorted in ascending order.
returnsorted : bool
not tested, return sorted p-values instead of original sequence
Returns
-------
reject : ndarray, boolean
true for hypothesis that can be rejected for given alpha
pvals_corrected : ndarray
p-values corrected for multiple tests
alphacSidak : float
corrected alpha for Sidak method
alphacBonf : float
corrected alpha for Bonferroni method
### 如果里面pvalue_array里面有NA值,需要先删掉,但是R语言的好像是内置有处理步骤可以自动删掉NA
import statsmodels.stats.multitest as multi
import numpy as np
multi.multipletests(pvalue_array, alpha=0.05, method="fdr_bh", is_sorted=False)
网友评论