构建置信区间

作者: 兀o | 来源:发表于2018-12-22 12:57 被阅读6次

[读书笔记]置信区间
R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot
构建置信区间
回归分析基本假设
统计学7-置信区间
95%置信区间 2020-06-10
R语言计算一组数据的置信区间的简单小例子
小马哥课堂-统计学-置信区间
R语言实现统计推断,T检验方差分析相关分析卡方检验
假设检验：非参数检验（卡方检验），参数检验（F检验，T检验，Z检

总体平均数的置信区间

问题：

样本中喝咖啡的人的比例是多少？不喝咖啡的人的比例是多少？
在喝咖啡的人中，他们的平均身高是多少？在不喝咖啡的人中，他们的平均身高是多少？
模拟来自200个原始样本的200个“新”个体。在该有放回抽样样本（bootstrap sample）中，喝咖啡的人的比例是多少？不喝咖啡的人的比例是多少？
现在模拟10,000次有放回抽样，并取每个样本中不喝咖啡的人的平均身高。每个有放回抽样样本应该是从200个数据点中取出的第一个样本。绘制分布图，并拉出95％置信区间所需的值。在这个例子中，关于平均数的抽样分布，你发现了什么？
你的区间是否记录了人群中不喝咖啡的人的实际平均身高？看一看人口中的平均数和95％置信区间提供的两个界限，然后回答下面的最后一个测试题目。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

np.random.seed(42)

coffee_full = pd.read_csv('coffee_dataset.csv')
coffee_red = coffee_full.sample(200) #this is the only data you might actually get in the real world.
coffee_red.head()

What is the proportion of coffee drinkers in the sample? What is the proportion of individuals that don't drink coffee?

coffee_red['drinks_coffee'].mean() # Drink Coffee
1 - coffee_red['drinks_coffee'].mean() # Don't Drink Coffee

Of the individuals who do not drink coffee, what is the average height?

coffee_red[coffee_red['drinks_coffee'] == False]['height'].mean()

Simulate 200 "new" individuals from your original sample of 200. What are the proportion of coffee drinkers in your bootstrap sample? How about individuals that don't drink coffee?

bootsamp = coffee_red.sample(200, replace = True)
bootsamp['drinks_coffee'].mean() # Drink Coffee and 1 minus gives the don't drink

Now simulate your bootstrap sample 10,000 times and take the mean height of the non-coffee drinkers in each sample. Plot the distribution, and pull the values necessary for a 95% confidence interval. What do you notice about the sampling distribution of the mean in this example?

boot_means = []
for _ in range(10000):
    bootsamp = coffee_red.sample(200, replace = True)
    boot_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
    boot_means.append(boot_mean)

plt.hist(boot_means); # Looks pretty normal

np.percentile(boot_means, 2.5), np.percentile(boot_means, 97.5)

Did your interval capture the actual average height of coffee drinkers in the population? Look at the average in the population and the two bounds provided by your 95% confidence interval, and then answer the final quiz question below.

coffee_full[coffee_full['drinks_coffee'] == False]['height'].mean()

网友评论

数据蛙数据分析每周作业

本文标题：构建置信区间

本文链接：https://www.haomeiwen.com/subject/fgsikqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

构建置信区间

相关文章

[读书笔记]置信区间

R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot

构建置信区间

回归分析基本假设

统计学7-置信区间

95%置信区间 2020-06-10

R语言计算一组数据的置信区间的简单小例子

小马哥课堂-统计学-置信区间

R语言实现统计推断,T检验方差分析相关分析卡方检验

假设检验：非参数检验（卡方检验），参数检验（F检验，T检验，Z检

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

数据蛙数据分析每周作业