1.导入模块
>>> import pandas as pd
2.设置显示列
>>> pd.set_option('display.max_rows', 100,'display.max_columns', 1000,"display.max_colwidth",1000,'display.width',1000)
3.导入表格数据
>>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
>>> titanic.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
>>> air_quality = pd.read_csv(r"C:\Users\Administrator\Desktop\air_quality_long.csv", index_col="date.utc", parse_dates=True)
>>> air_quality.head()
city country location parameter value unit
date.utc
2019-06-18 06:00:00+00:00 Antwerpen BE BETR801 pm25 18.0 µg/m³
2019-06-17 08:00:00+00:00 Antwerpen BE BETR801 pm25 6.5 µg/m³
2019-06-17 07:00:00+00:00 Antwerpen BE BETR801 pm25 18.5 µg/m³
2019-06-17 06:00:00+00:00 Antwerpen BE BETR801 pm25 16.0 µg/m³
2019-06-17 05:00:00+00:00 Antwerpen BE BETR801 pm25 7.5 µg/m³
4.对年龄进行升序排序
>>> titanic.sort_values(by="Age").head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
803 804 1 3 Thomas, Master. Assad Alexander male 0.42 0 1 2625 8.5167 NaN C
755 756 1 2 Hamalainen, Master. Viljo male 0.67 1 1 250649 14.5000 NaN S
644 645 1 3 Baclini, Miss. Eugenie female 0.75 2 1 2666 19.2583 NaN C
469 470 1 3 Baclini, Miss. Helene Barbara female 0.75 2 1 2666 19.2583 NaN C
78 79 1 2 Caldwell, Master. Alden Gates male 0.83 0 2 248738 29.0000 NaN S
5.对客舱等级和年龄按降序对数据进行排序
>>> titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
851 852 0 3 Svensson, Mr. Johan male 74.0 0 0 347060 7.7750 NaN S
116 117 0 3 Connors, Mr. Patrick male 70.5 0 0 370369 7.7500 NaN Q
280 281 0 3 Duane, Mr. Frank male 65.0 0 0 336439 7.7500 NaN Q
483 484 1 3 Turkula, Mrs. (Hedwig) female 63.0 0 0 4134 9.5875 NaN S
326 327 0 3 Nysveen, Mr. Johan Hansen male 61.0 0 0 345364 6.2375 NaN S
6.过滤列指定值
>>> no2 = air_quality[air_quality["parameter"] == "no2"]
>>> no2
city country location parameter value unit
date.utc
2019-06-21 00:00:00+00:00 Paris FR FR04014 no2 20.0 µg/m³
2019-06-20 23:00:00+00:00 Paris FR FR04014 no2 21.8 µg/m³
2019-06-20 22:00:00+00:00 Paris FR FR04014 no2 26.5 µg/m³
2019-06-20 21:00:00+00:00 Paris FR FR04014 no2 24.9 µg/m³
2019-06-20 20:00:00+00:00 Paris FR FR04014 no2 21.4 µg/m³
... ... ... ... ... ... ...
2019-04-09 06:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
2019-04-09 05:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
2019-04-09 04:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
2019-04-09 03:00:00+00:00 London GB London Westminster no2 67.0 µg/m³
2019-04-09 02:00:00+00:00 London GB London Westminster no2 67.0 µg/m³
[3447 rows x 6 columns]
7.将不同行的值转为列值显示
>>> no2_subset = no2.sort_index().groupby(["location"]).head(3)
>>> no2_subset
city country location parameter value unit
date.utc
2019-04-09 01:00:00+00:00 Antwerpen BE BETR801 no2 22.5 µg/m³
2019-04-09 01:00:00+00:00 Paris FR FR04014 no2 24.4 µg/m³
2019-04-09 02:00:00+00:00 London GB London Westminster no2 67.0 µg/m³
2019-04-09 02:00:00+00:00 Antwerpen BE BETR801 no2 53.5 µg/m³
2019-04-09 02:00:00+00:00 Paris FR FR04014 no2 27.4 µg/m³
2019-04-09 03:00:00+00:00 Antwerpen BE BETR801 no2 54.5 µg/m³
2019-04-09 03:00:00+00:00 London GB London Westminster no2 67.0 µg/m³
2019-04-09 03:00:00+00:00 Paris FR FR04014 no2 34.2 µg/m³
2019-04-09 04:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
>>> no2_subset.pivot(columns="location", values="value")
location BETR801 FR04014 London Westminster
date.utc
2019-04-09 01:00:00+00:00 22.5 24.4 NaN
2019-04-09 02:00:00+00:00 53.5 27.4 67.0
2019-04-09 03:00:00+00:00 54.5 34.2 67.0
2019-04-09 04:00:00+00:00 NaN NaN 41.0
8.使用透视表查看每个站的no2和pm25的平均值
>>> air_quality.pivot_table(values="value", index="location", columns="parameter", aggfunc="mean")
parameter no2 pm25
location
BETR801 26.950920 23.169492
FR04014 29.374284 NaN
London Westminster 29.740050 13.443568
>>> air_quality
city country location parameter value unit
date.utc
2019-06-18 06:00:00+00:00 Antwerpen BE BETR801 pm25 18.0 µg/m³
2019-06-17 08:00:00+00:00 Antwerpen BE BETR801 pm25 6.5 µg/m³
2019-06-17 07:00:00+00:00 Antwerpen BE BETR801 pm25 18.5 µg/m³
2019-06-17 06:00:00+00:00 Antwerpen BE BETR801 pm25 16.0 µg/m³
2019-06-17 05:00:00+00:00 Antwerpen BE BETR801 pm25 7.5 µg/m³
... ... ... ... ... ... ...
2019-04-09 06:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
2019-04-09 05:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
2019-04-09 04:00:00+00:00 London GB London Westminster no2 41.0 µg/m³
2019-04-09 03:00:00+00:00 London GB London Westminster no2 67.0 µg/m³
2019-04-09 02:00:00+00:00 London GB London Westminster no2 67.0 µg/m³
[5272 rows x 6 columns]
网友评论