新大Python —— Pandas入门WorldIndex.c

作者: 群群对你说 | 来源:发表于2017-08-27 00:37 被阅读0次

新大Python —— Pandas入门WorldIndex.c
数据分析学习计划
007pandas库入门
第5章 Pandas入门(1)
Content
机器学习三剑客之Pandas
第三方库-Series-基础
python pandas入门一
Python数据科学入门--Pandas学习精要
就业班第三阶段 python数据处理

作业：根据《Pandas入门》课程学习内容，对提供的 WordIndex.csv 数据进行简单的查看和可视化分析。

一、数据导入

import pandas as pd  # 导入pandas

%matplotlib inline  # 导入作图
%config InlineBackend.figure_format = 'retina'  # 设置图像清晰度

col_names =['Country','Continent','Life_expectancy','GDP_per_capita','Population']  #设置表头（列标签）
worldindex = pd.read_csv('WorldIndex.csv',names=col_names)  # 读取WordIndex.csv 数据

worldindex.head()   # 查看前5行数据

前5行数据

二、数据查看与描述

用df.info()方法查看数据全貌，可以得到该数据一共有164行、5列，其中Country、Continent两列是基类，Life_expectancy、GDP_per_capita 两列是浮点类型数据，Population一列是整型数据，没有缺失值。

worldindex.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 164 entries, 0 to 163
Data columns (total 5 columns):
Country            164 non-null object
Continent          164 non-null object
Life_expectancy    164 non-null float64
GDP_per_capita     164 non-null float64
Population         164 non-null int64
dtypes: float64(2), int64(1), object(2)
memory usage: 6.5+ KB

也可以用df.shape查看数据的行数、列数

worldindex.shape

(164, 5)

用df.describe()方法查看worldindex数据的汇总统计，包括样本量（count）、均值（mean）、标准差（std）、最大（max）和最小（min）值，以及分位数。

worldindex.describe()

worldindex数据汇总统计

用se.unique()方法查看Continent情况

worldindex.Continent.unique()

array(['Africa', 'Asia', 'Europe', 'North America', 'Oceania',
       'South America'], dtype=object)

用se.value_counts()方法查看Continent种类及对应的数据个数

worldindex.Continent.value_counts()

Africa           48
Europe           41
Asia             36
North America    19
South America    11
Oceania           9
Name: Continent, dtype: int64

用df.describe()方法查看Africa的数据在统计方面的信息，其他洲的数据统计方法相同。

Africa=worldindex[worldindex.Continent=="Africa"]
Africa.describe()

Africa数据统计

以上，我们已经对数据有了一个大体的感觉。

三、数据可视化

箱图可以直观地反映分位数。对'Africa', 'Asia', 'Europe', 'North America', 'Oceania', 'South America'这五大洲的人群预期寿命、人均GDP、人口数量作箱图。

五大洲人口预期寿命箱图：可以看出，欧洲人口预期寿命最长，北美洲、南美洲、亚洲、大洋洲次之，非洲的人口预期寿命最短。此外，亚洲、非洲的人口预期寿命离散程度最大，欧美、大洋洲则较集中。

worldindex[['Life_expectancy', 'Continent']].boxplot(grid=False, by='Continent', figsize=(10, 6))

五大洲人口预期寿命箱图

五大洲人均GDP箱图：欧洲人均GDP最高，但离散程度也最大，非洲人均GDP最低。

worldindex[['GDP_per_capita', 'Continent']].boxplot(grid=False, by='Continent', figsize=(10, 6))

五大洲人均GDP箱图

五大洲人口数箱图：

worldindex[['Population', 'Continent']].boxplot(grid=False, by='Continent', figsize=(10, 6))

五大洲人口数箱图
问题：上图中Y轴的界限太大，尝试用ylim调整未成功，需要请教一下老师。

五大洲人口数柱状图：可见亚洲人口数最多，非洲第二多，欧美再次之，大洋洲最少。

#将数据按五大洲分组
Africa=worldindex[worldindex.Continent=="Africa"]
Asia=worldindex[worldindex.Continent=="Asia"]
Europe=worldindex[worldindex.Continent=='Europe']
NorthAmerica=worldindex[worldindex.Continent=='North America']
Oceania=worldindex[worldindex.Continent=='Oceania']
SouthAmerica=worldindex[worldindex.Continent=='South America']

#绘制五大洲人口数柱状图
import matplotlib.pyplot as plt

p=pd.Series([Africa.Population.sum(), Asia.Population.sum(), Europe.Population.sum(), NorthAmerica.Population.sum(), Oceania.Population.sum(),SouthAmerica.Population.sum()], index=['Africa', 'Asia', 'Europe', 'North America', 'Oceania','South America'])

p.plot(kind='bar')

五大洲人口数柱状图
问题：感觉这里画柱状图很麻烦，需要请教一下老师是否有简便方法。

人均GDP与人群预期寿命散点图：可见人口在20000以内时，人群预期寿命随人均GDP的增多而急剧增多，人口超过20000后，人群预期寿命受人均GDP变化的变化很小。

worldindex.plot(kind='scatter', x="GDP_per_capita", y="Life_expectancy")

人均GDP与人群预期寿命散点图

网友评论

我爱编程

本文标题：新大Python —— Pandas入门WorldIndex.c

本文链接：https://www.haomeiwen.com/subject/pdbvdxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

新大Python —— Pandas入门WorldIndex.c

一、数据导入

二、数据查看与描述

三、数据可视化

相关文章