均数差

作者: 兀o | 来源:发表于2019-01-06 15:23 被阅读0次

均数差
医学统计学第八章（ t 检验）
方差与标准差
标准差与标准误区别
测量学中的几种误差
推荐系统评测指标2-预测准确度
均数加减标准差如何编辑出来？
meta分析
PH525x series - Rank tests
R做截断柱状图并加显著性统计

均数差置信区间
问题：
1. 对于10,000次迭代，自展法（bootstrap）会对你的样本数据进行抽样，计算喝咖啡和不喝咖啡的人的平均身高的差异。使用你的抽样分布建立一个99％的置信区间。根据你的区间开始回答下面的第一个测试题目。

2. 对于10,000次迭代，自展法会对样本数据进行抽样，计算21岁以上和21岁以下的平均身高的差异。使用你的抽样分布构建一个99％的置信区间。根据你的区间来完成回答下面的第一个测试题目。

3. 对于10,000次迭代，自展法会对你的样本数据进行抽样，计算出21岁以下个体的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之间的差异。使用你的抽样分布，建立一个95％的置信区间。根据你的区间来回答下面的第二个测试题目。

4. 对于10,000次迭代，自展法会对你的样本数据进行抽样，计算出21岁以上个体的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之间的差异。使用你的抽样分布，建立一个95％的置信区间。根据你的区间来回答下面的第二个测试题目以及下列问题。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
np.random.seed(42)

full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
sample_data.head()

For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for coffee and non-coffee drinkers. Build a 99% confidence interval using your sampling distribution. Use your interval to start answering the first quiz question below.

diffs = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
    nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
    diffs.append(coff_mean - nocoff_mean)
 
np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) 
# statistical evidence coffee drinkers are on average taller

plt.hist(diffs)

For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for those older than 21 and those younger than 21. Build a 99% confidence interval using your sampling distribution. Use your interval to finish answering the first quiz question below.

diffs_age = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_mean = bootsamp[bootsamp['age'] == '<21']['height'].mean()
    over21_mean = bootsamp[bootsamp['age'] != '<21']['height'].mean()
    diffs_age.append(over21_mean - under21_mean)

np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)
# statistical evidence that over21 are on average taller

# diffs_coff_under211=[]
for _ in range(10000):
    bootsamp=sample_data.sample(200,replace=True)
    under21_coff_mean=bootsamp[bootsamp['age']]

For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below.

diffs_coff_under21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
    under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)

np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
# For the under21 group, we have evidence that the non-coffee drinkers are on average taller

For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to finish answering the second quiz question below. As well as the following questions.

diffs_coff_over21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    over21_coff_mean = bootsamp.query("age != '<21' and drinks_coffee == True")['height'].mean()
    over21_nocoff_mean = bootsamp.query("age != '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_over21.append(over21_nocoff_mean - over21_coff_mean)

np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)
# For the over21 group, we have evidence that on average the non-coffee drinkers are taller