pandas 2

作者: 钊钖 | 来源:发表于2017-12-29 16:48 被阅读0次

pandas 2

Learn to handle missing data using pandas and a data set on Titanic survival.

Introduction

import pandas as pd
titanic_survival = pd.read_csv("titanic_survival.csv")


Finding the Missing Data

The Pandas library uses NaN, which stands for "not a number", to indicate a missing value.

If we want to see which values are NaN, we can use the pandas.isnull() function which takes a pandas series and returns a series of True and False values, the same way that NumPy did when we compared arrays.

sex = titanic_survival["sex"]
sex_is_null = pandas.isnull(sex)

We can use this resultant series to select only the rows that have null values.

sex_null_true = sex[sex_is_null]

We'll use this structure to look at the null values for the "age" column.

Instructions

Count how many values in the "age" column have null values:

  • Use pandas.isnull() on age variable to create a Series of True and False values.

  • Use the resulting series to select only the elements in age that are null, and assign the result to age_null_true

  • Assign the length of age_null_true to age_null_count.

Print age_null_count to see how many null values are in the "age" column.

age = titanic_survival["age"]
print(age.loc[10:20])
age_is_null = pd.isnull(age)
age_null_true = age[age_is_null]
age_null_count = len(age_null_true)
print(age_null_count)


Easier Ways to Do Math

Luckily, missing data is so common that many pandas methods automatically filter for it. For example, if we use use the Series.mean() method to calculate the mean of a column, missing values will not be included in the calculation.

To calculate the mean age that we did earlier, we can replace all of our code with one line

correct_mean_age = titanic_survival["age"].mean()
############
age_is_null =pd.isnull(titanic_survival["age"])

good_ages = titanic_survival["age"][age_is_null == False]

correct_mean_age =sum(good_ages) / len(good_ages)

##########

correct_mean_fare =titanic_survival["fare"].mean()


Calculating Summary Statistics

Let's calculate more summary statistics for the data.

The pclass column indicates the cabin class for each passenger, which was either first class (1), second class (2), or third class (3).

passenger_classes = [1, 2, 3]

You'll use the list passenger_classes, which contains these values, in the following exercise.

Instructions

Use a for loop to iterate over passenger_classes. Within the for loop:

  • Select just the rows in titanic_survival where the pclass value is equivalent to the current iterator value (class).
for this_class in passenger_classes:
    pclass_rows =titanic_survival[titanic_survival["pclass"] == this_class]
  • Select just the fare column for the current subset of rows.
pclass_fares = pclass_rows["fare"]
  • Use the Series.mean method to calculate the mean of this subset.
fare_for_class = pclass_fares.mean()
  • Add the mean of the class to the fares_by_class dictionary with class as the key.

fares_by_class[this_class] = fare_for_class

Once the loop completes, the dictionary fares_by_class should have 1, 2, and 3 as keys, with the average fares as the corresponding values.

passenger_classes = [1, 2, 3]

fares_by_class = {}

for this_class in passenger_classes:

    pclass_rows =titanic_survival[titanic_survival["pclass"]== this_class]
    
    pclass_fares = pclass_rows["fare"]
    
    fare_for_class = pclass_fares.mean()
    
    fares_by_class[this_class] = fare_for_class

相关文章

  • 科学计算库pandas执行示例

    pandas1 pandas2 pandas3 pandas4 pandas5

  • pandas

    pandas之初窥门径 1、pandas声明 2、pandas条件过滤, 3、pandas函数 3、pandas转...

  • pandas 2

    pandas 2 Learn to handle missing data using pandas and a ...

  • pandas 使用

    1. Series 使用 2. pandas:索引 pandas:数据对齐,相加 3. pandas:DataF...

  • pandas玩转Excel01-如何创建Excel文件

    导入pandas库 import pandas as pdpd.DateFrame({'ID':['1','2',...

  • 2021-12-31 Python-23

    pandas pandas数据结构 pandas 有 2 个常用的数据结构:Series 和 Dataframe一...

  • Pandas

    Pandas 目录一、Pandas基础二、Pandas三大数据结构1.Series2.DataFrame3.Ind...

  • pandas[2]

    pandas1pandas2 分組操作 GroupBy DataFrame.groupbyDataFrame.gr...

  • Pandas

    1. print(pandas.__version__) 终端conda update pandas 2. df ...

  • pandas[1]

    pandas1pandas2 匯入模組 Pandas是建立在numpy的一個資料處理套件。 Pandas 數據結構...

网友评论

    本文标题:pandas 2

    本文链接:https://www.haomeiwen.com/subject/wyxwgxtx.html