构建置信区间

作者: 兀o | 来源:发表于2018-12-22 12:57 被阅读6次

    总体平均数的置信区间

    问题:

    1. 样本中喝咖啡的人的比例是多少?不喝咖啡的人的比例是多少?
    2. 在喝咖啡的人中,他们的平均身高是多少?在不喝咖啡的人中,他们的平均身高是多少?
    3. 模拟来自200个原始样本的200个“新”个体。在该有放回抽样样本(bootstrap sample)中,喝咖啡的人的比例是多少?不喝咖啡的人的比例是多少?
    4. 现在模拟10,000次有放回抽样,并取每个样本中不喝咖啡的人的平均身高。每个有放回抽样样本应该是从200个数据点中取出的第一个样本。绘制分布图,并拉出95%置信区间所需的值。在这个例子中,关于平均数的抽样分布,你发现了什么?
    5. 你的区间是否记录了人群中不喝咖啡的人的实际平均身高?看一看人口中的平均数和95%置信区间提供的两个界限,然后回答下面的最后一个测试题目。
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    np.random.seed(42)
    
    coffee_full = pd.read_csv('coffee_dataset.csv')
    coffee_red = coffee_full.sample(200) #this is the only data you might actually get in the real world.
    coffee_red.head()
    
    1. What is the proportion of coffee drinkers in the sample? What is the proportion of individuals that don't drink coffee?
    coffee_red['drinks_coffee'].mean() # Drink Coffee
    1 - coffee_red['drinks_coffee'].mean() # Don't Drink Coffee
    
    1. Of the individuals who do not drink coffee, what is the average height?
    coffee_red[coffee_red['drinks_coffee'] == False]['height'].mean()
    
    1. Simulate 200 "new" individuals from your original sample of 200. What are the proportion of coffee drinkers in your bootstrap sample? How about individuals that don't drink coffee?
    bootsamp = coffee_red.sample(200, replace = True)
    bootsamp['drinks_coffee'].mean() # Drink Coffee and 1 minus gives the don't drink
    
    1. Now simulate your bootstrap sample 10,000 times and take the mean height of the non-coffee drinkers in each sample. Plot the distribution, and pull the values necessary for a 95% confidence interval. What do you notice about the sampling distribution of the mean in this example?
    boot_means = []
    for _ in range(10000):
        bootsamp = coffee_red.sample(200, replace = True)
        boot_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
        boot_means.append(boot_mean)
    
    plt.hist(boot_means); # Looks pretty normal
    
    np.percentile(boot_means, 2.5), np.percentile(boot_means, 97.5)
    
    1. Did your interval capture the actual average height of coffee drinkers in the population? Look at the average in the population and the two bounds provided by your 95% confidence interval, and then answer the final quiz question below.
    coffee_full[coffee_full['drinks_coffee'] == False]['height'].mean() 
    

    相关文章

      网友评论

        本文标题:构建置信区间

        本文链接:https://www.haomeiwen.com/subject/fgsikqtx.html