均数差

作者: 兀o | 来源:发表于2019-01-06 15:23 被阅读0次

均数差置信区间
问题:
1. 对于10,000次迭代,自展法(bootstrap)会对你的样本数据进行抽样,计算喝咖啡和不喝咖啡的人的平均身高的差异。使用你的抽样分布建立一个99%的置信区间。根据你的区间开始回答下面的第一个测试题目。

2. 对于10,000次迭代,自展法会对样本数据进行抽样,计算21岁以上和21岁以下的平均身高的差异。使用你的抽样分布构建一个99%的置信区间。根据你的区间来完成回答下面的第一个测试题目。

3. 对于10,000次迭代,自展法会对你的样本数据进行抽样,计算出21岁 以下 个体的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之间的 差异 。使用你的抽样分布,建立一个95%的置信区间。根据你的区间来回答下面的第二个测试题目。

4. 对于10,000次迭代,自展法会对你的样本数据进行抽样,计算出21岁 以上 个体的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之间的 差异 。使用你的抽样分布,建立一个95%的置信区间。根据你的区间来回答下面的第二个测试题目以及下列问题。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
np.random.seed(42)

full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
sample_data.head()
  1. For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for coffee and non-coffee drinkers. Build a 99% confidence interval using your sampling distribution. Use your interval to start answering the first quiz question below.
diffs = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
    nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
    diffs.append(coff_mean - nocoff_mean)
 
np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) 
# statistical evidence coffee drinkers are on average taller
plt.hist(diffs)
  1. For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for those older than 21 and those younger than 21. Build a 99% confidence interval using your sampling distribution. Use your interval to finish answering the first quiz question below.
diffs_age = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_mean = bootsamp[bootsamp['age'] == '<21']['height'].mean()
    over21_mean = bootsamp[bootsamp['age'] != '<21']['height'].mean()
    diffs_age.append(over21_mean - under21_mean)

np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)
# statistical evidence that over21 are on average taller
# diffs_coff_under211=[]
for _ in range(10000):
    bootsamp=sample_data.sample(200,replace=True)
    under21_coff_mean=bootsamp[bootsamp['age']]

  1. For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below.
diffs_coff_under21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
    under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)

np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
# For the under21 group, we have evidence that the non-coffee drinkers are on average taller
  1. For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to finish answering the second quiz question below. As well as the following questions.
diffs_coff_over21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    over21_coff_mean = bootsamp.query("age != '<21' and drinks_coffee == True")['height'].mean()
    over21_nocoff_mean = bootsamp.query("age != '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_over21.append(over21_nocoff_mean - over21_coff_mean)
np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)
# For the over21 group, we have evidence that on average the non-coffee drinkers are taller

相关文章

  • 均数差

    均数差置信区间问题:1. 对于10,000次迭代,自展法(bootstrap)会对你的样本数据进行抽样,计算喝咖啡...

  • 医学统计学 第八章( t 检验)

    第一节 单样本 t 检验(样本均数与总体均数的比较) 总体标准差 σ 未知且样本含量较小,要求样本来自正态分布总体...

  • 方差与标准差

    标准差(StandardDeviation),也称均方差(meansquareerror),是各数据偏离平均数的距...

  • 标准差与标准误区别

    有统计学教科书在讲到“标准误”时,写下了这样一句话——“样本均数的标准误是样本均数的标准差”! 是的,你没有看错,...

  • 测量学中的几种误差

    均方根误差与标准差区别 标准差(STD):观测值与其平均数偏差的平方和的平方根。它反映组内个体间的离散程度。 均方...

  • 推荐系统评测指标2-预测准确度

    1 评分预测:预测用户对物品的评分 RMSE均方根误差=实际评分差的平方的平均数,再开根号MAE均方根误差=实际评...

  • 均数加减标准差如何编辑出来?

    插入--公式--标注符号--上标。

  • meta分析

    血清锌和铜与高血压的关系:meta分析 太恐怖了。什么都看不懂1、 95%置信区间的标准化均数差(SMD)标准化均...

  • PH525x series - Rank tests

    Wilcoxon Rank Sum Test 由于样本均数、标准差对离群值很敏感,而t检验是基于这些统计量进行的,...

  • R做截断柱状图并加显著性统计

    本教程全部是基于ggpubr画的,当然ggplot2更好,但是代码太复杂,还要另外计算均数和标准差,ggpubr可...

网友评论

    本文标题:均数差

    本文链接:https://www.haomeiwen.com/subject/bxusrqtx.html