美文网首页
#2.1.10 ★Guided Project: Analyzi

#2.1.10 ★Guided Project: Analyzi

作者: 禮記 | 来源:发表于2017-09-28 19:16 被阅读0次

    1. Introducing Thanksgiving Dinner Data

    Instructions

    • Import the pandas package.

    • 使用pandas.read_csv()函数来读取thanksgiving.csv
      文件。

    • 确保指定关键字参数encoding="Latin-1",如CSV文件通常不编码。

    • 分配结果的变量data。

    • 显示的前几行data,看看行和列的样子。

    • In a separate notebook cell, display all of the column names to get a sense of what the data consists of.

    import pandas as pd
    data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
    data.head()
    data.columns()
    
    

    3. Using value_counts To Explore Main Dishes

    input
    print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())
    
    output
    Turkey 859Other (please specify) 35Ham/Pork 29Tofurkey 20Chicken 12Roast beef 11I don't know 5Turducken 3Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
    

    4. Figuring Out What Pies People Eat

    input
    apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
    pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
    pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
    ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
    print(ate_pies.value_counts())
    
    output
    False 876True 182dtype: int64
    # 说明有182个选项就没有选择三者pie的任意一种
    

    5. Converting Age To Numeric

    input
    
    print(data['Age'].value_counts())
    
    
    output
    
    45 - 59 28660+ 26430 - 44 25918 - 29 216Name: Age, dtype: int64
    
    
    input
    
    def str_to_int(age_str):
    
        if pd.isnull(age_str):    # Use the isnull() function to check if the value is null. If it is, return None.
    
            return None
    
        age_str = age_str.split(' ')[0]# Split the string on the space character (), and extract the first item of the resulting list.
    
        age_str = age_str.replace('+', '') # Replace the + character in the result with an empty string to remove it.
    
        return int(age_str) # Use int() to convert the result to an integer.
    
    data['int_age'] = data['Age'].apply(str_to_int) # Use the pandas.Series.apply() method to apply the function to each value in the Age column of data.
    
    data['int_age'].describe() # Call the pandas.Series.describe() method on the int_age column of data, and display the result.
    
    
    output
    
    count 1025.000000mean 39.383415std 15.398493min 18.00000025% 30.00000050% 45.00000075% 60.000000max 60.000000Name: int_age, dtype: float64
    

    6. Converting Income To Numeric

    input
    print(data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts())
    
    output
    $25,000 to $49,999 180Prefer not to answer 136$50,000 to $74,999 135$75,000 to $99,999 133$100,000 to $124,999 111$200,000 and up 80$10,000 to $24,999 68$0 to $9,999 66$125,000 to $149,999 49$150,000 to $174,999 40$175,000 to $199,999 27Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64
    
    input
    def income_to_int(income_str):
        if pd.isnull(income_str):  # Use the isnull() function to check if the value is null. If it is, return None.
            return None
        income_str = income_str.split(' ')[0] # Split the string on the space character (), and extract the first item of the resulting list.
        if income_str == 'Prefer':
            return None
        income_str = income_str.replace('$', '')
        income_str = income_str.replace(',', '')
        return int(income_str)
    
    data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_to_int)
    print(data['int_income'].describe())
    
    output
    count 889.000000mean 74077.615298std 59360.742902min 0.00000025% 25000.00000050% 50000.00000075% 100000.000000max 200000.000000Name: int_income, dtype: float64
    
    

    7. Correlating Travel Distance And Income

    input
    print(data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts())
    print('--------------------------------------------------')
    print(data[data['int_income'] > 150000]['How far will you travel for Thanksgiving?'].value_counts())
    
    output
    Thanksgiving is happening at my home--I won't travel at all 281Thanksgiving is local--it will take place in the town I live in 203Thanksgiving is out of town but not too far--it's a drive of a few hours or less 150Thanksgiving is out of town and far away--I have to drive several hours or fly 55Name: How far will you travel for Thanksgiving?, dtype: int64--------------------------------------------------Thanksgiving is happening at my home--I won't travel at all 49Thanksgiving is local--it will take place in the town I live in 25Thanksgiving is out of town but not too far--it's a drive of a few hours or less 16Thanksgiving is out of town and far away--I have to drive several hours or fly 12Name: How far will you travel for Thanksgiving?, dtype: int64
    
    

    8. Linking Friendship And Age

    input
    data.pivot_table(
        index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?",
        columns = 'Have you ever attended a "Friendsgiving?"',
        values = 'int_age'
    )
    
    output

    [图片上传中。。。(1)]#####input

    data.pivot_table(
        index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?',
        columns = 'Have you ever attended a "Friendsgiving?"',
        values = 'int_income'
    )
    
    output

    [图片上传中。。。(2)]

    相关文章

      网友评论

          本文标题:#2.1.10 ★Guided Project: Analyzi

          本文链接:https://www.haomeiwen.com/subject/lmuvextx.html