pandas 3

作者: 钊钖 | 来源:发表于2017-12-29 16:48 被阅读0次

pandas 3

Making Pivot Tables

Pivot tables provide an easy way to subset by one column and then apply a calculation like a sum or a mean.

Pivot tables first group and then apply a calculation. In the previous screen, we actually made a pivot table manually by grouping by the column "pclass" and then calculating the mean of the "fare" column for each class.

Luckily, we can use the Dataframe.pivot_table() method instead, which simplifies the kind of work we did on the last screen. To produce the same data, we could use one line.

passenger_class_fares =titanic_survival.pivot_table(index="pclass", values="fare", aggfunc=np.mean)

The first parameter of the method, index tells the method which column to group by.

The second parameter values is the column that we want to apply the calculation to, and aggfunc specifies the calculation we want to perform.

The default for the aggfunc parameter is actually the mean, so if we're calculating this we can omit this parameter.

Instructions

Use the DataFrame.pivot_table() method to calculate the mean age for each passenger class ("pclass").
Assign the result to passenger_age.
Display the passenger_age pivot table using the print() function.

import numpy as np

passenger_survival =titanic_survival.pivot_table(index="pclass", values="survived")

passenger_age =titanic_survival.pivot_table(index="pclass", values="age")

print(passenger_age)

If we pass a list of column names to the values parameter instead of a single value, we can perform calculations on multiple columns at once.

We can also specify a custom calculation to be made. For instance, if we pass np.sum to the aggfunc parameter it will total the values in each column.

Instructions

Make a pivot table that calculates the total fares collected ("fare") and total number of survivors ("survived") for each embarkation port ("embarked").
Assign the result to port_stats.
Display port_stats using the print() function.

import numpy as np

port_stats =titanic_survival.pivot_table(index = 'embarked',values = ['fare',"survived"],aggfunc= numpy.sum)
  
print(port_stats)

Drop Missing Values

We can use the DataFrame.dropna() method on pandas DataFrames to do this. The method will drop any rows that contain missing values.

The dropna() method takes an axis parameter, which indicates whether you would like to drop rows or columns.

Specifying axis=0 or axis='index' will drop any rows that have null values, while specifying axis=1 or axis='columns' will drop any columns that have null values.

Instructions

Drop all columns in titanic_survival that have missing values and assign the result to drop_na_columns.
Drop all rows in titanic_survival where the columns "age" or "sex" have missing values and assign the result to new_titanic_survival.

drop_na_columns =titanic_survival.dropna(axis = 1)

new_titanic_survival = titanic_survival.dropna(axis =0,subset=['sex','age'])

网友评论

我爱编程

本文标题：pandas 3

本文链接：https://www.haomeiwen.com/subject/luxwgxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

pandas 3

pandas 3

Making Pivot Tables

Instructions

Instructions

Drop Missing Values

Instructions

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

我爱编程