Statistics基本定理

作者: shudaxu | 来源:发表于2021-01-27 22:08 被阅读0次

概念：

总体均值 $E[(X)]=u$
总体方差 $E[(X-u)^2] = var(X) = \sigma^2$
样本均值 $E[X']=\sum(X_i')/n$
样本方差 $E[(X'-E[X'])^2] = \frac{\sum_{i}^N(X_i'-E[(X')])^2}{(n-1)} = S^2$
估计值表示：
estimator，常用：p为真值， $\hat{p}$ 为p的估计
无偏(unbias)：
用样本统计估计总体参数时，估计量的均值（数学期望）与未知参数的真值一样时，为无偏估计。
一致性（consistency）：
随着样本量的增加，偏差越来越小，则称为一致性估计。譬如用MLE对方差估计时，其偏差为 $\sigma^2/n$ ，这个值随着样本量n的增大而减小，所以为一致性估计。
有效性（availability）：
到达variance最小的estimator
Empirical：
empirical->从样本中获得的，ture-> 从总体中获得的。譬如：empirical risk minimization，ERM常用的手段就是MLE，structural risk minimization, SRM就比如是加了正则的MLE，ie：MAP

性质：

期望乘法： $E(kX)=k*E(X)$
方差乘法： $Var(kX)=k^2 * Var(X)$
方差加法： $Var(X1+X2)=Var(X1) + Var(X2) + Cov(X1,X2)$
方差分解： $Var(X) =E[Var(X|Y)] +Var(E[X|Y])$
方差： $\sigma^2 = E[(X-u)^2] = E(X^2) - u^2$
协方差 $Cov(X,Y)=E[XY]-E(X)E(Y)$
X，Y不相关则： $E[XY]=E(X)E(Y)$ ，即 $Cov(X,Y)=0$ ，此时 $Var(X1-X2)=Var(X1+X2)=Var(X1)+Var(X2)$
协方差性质
$Cov(aX,bY)=abCov(X,Y)$
$Cov(\sum X_i,\sum Y_j) = \sum_i \sum_j Cov(X_i,Y_j)$
样本均值是总体均值的无偏估计
样本方差是总体方差的无偏估计（分母为 $n-1$ ）
样本均值的方差为 $\sigma^2/n$ 。因此，10个样本和100个样本估计均值都是无偏的，但100个样本估计出来其估计值的方差更小，所以更有效：
$Var(\overline X) = Var(\frac {\sum X} n )$
$=\frac {nVar(X)}{n^2}=\frac {Var(X)}{n}$

部分推论与定理：

1、CLT：样本均值收敛于正态分布。
样本均值为一个随机变量，采样多次计算，获得多个样本值，这个值收敛于u=总体均值的正太分布
2、协方差为0：
cov(X1,X2)=0代表两变量不相关（没有线性关系），但是不代表其独立。
(correlation does not imply causation)
3、Law of total expectation：
$E[X]= E[E[X|Y]]$
4、Gauss-Markov Theorem
假设：
$E[\epsilon]=0$ ，误差期望为0
$Var(\epsilon)=\sigma^2$ ，同方差
$Cov(\epsilon_i,\epsilon_j)=0$ ，不相关
则：
OLS estimator为BLUE（Best Linear unbiased estimator）
5、Cramer-Rao lower bound（CRLB）
通过variance下界来确定estimator是否是有效的。
6、 Resampling：
Bootstrap：Sampling with replacement from the original sample。[1]
Subsampling：No replacement，and resample size is smaller than the sample size[2]
如果用多个Resampling的mean来估计总体的mean，其实是estimate on estimate，不一定会更好。
当少量离群值的扰动对我们估计有很大的影响时，用Bootstrap可以缓解。[3]
即：这种方式可以降低我们estimate的 variance，但是可能会导致更大的bias。这就引申到bootstrap bias[4]的解决方式了。

[1]:其convergence:
"Unless one is reasonably sure that the underlying distribution is not [heavy tailed], one should hesitate to use the naive bootstrap".
[2]:subsampling leads to valid inference whereas bootstrapping does not
[3]:The basic idea is that if your estimator is very sensitive to perturbations in the data (i.e., the estimator has high variance and low bias), then you can average over lots of bootstrap samples to reduce the amount of overfitting particular examples.
[4]:
[5]:CV或者Bootstrapping的方式估计prediction error：https://stats.stackexchange.com/questions/18348/differences-between-cross-validation-and-bootstrapping-to-estimate-the-predictio

Statistics基本定理

概念：

性质：

部分推论与定理：

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

机器学习-算法理论