The dataset is stored in recent-grads.csv file.It contains information on earnings of college majors in US from 2010 to 2012.
It can be download form here:https://github.com/fivethirtyeight/data/tree/master/college-majors
In this project,I will explore the dataset and try to find some patterns in the earning of majors then plot it use matplotlib library.
代码使用jupyter完成:
读取数据:
import pandas as pd
recent_grads=pd.read_csv('./data/recent-grads.csv')
recent_grads.columns
print(recent_grads.info())
print(recent_grads.describe())
print(recent_grads.head(1))
处理缺失值:
raw_data_count=recent_grads.shape[0]
print(raw_data_count)
cleaned_data_count=recent_grads.dropna().shape[0]
print(cleaned_data_count)
==>>173
172
绘制散点图,查看各属性之间的关系:
import matplotlib.pyplot as plt
%matplotlib inline
recent_grads.plot(x='Full_time',y='Median',kind='scatter')
recent_grads.plot(x='Unemployed',y='Median',kind='scatter')
recent_grads.plot(x='Men',y='Median',kind='scatter')
recent_grads.plot(x='Women',y='Median',kind='scatter')
得到
data:image/s3,"s3://crabby-images/29264/29264f592e8b57c3e9b507a22f9a0a5168221dee" alt=""
我们继续绘制柱状图,查看各属性的分布情况:
columns=['Median','Employed','Employed','Unemployment_rate','Women','Men']
['Men'].hist()
fig=plt.figure(figsize=(6,18))
for i,col in enumerate(columns):
ax=fig.add_subplot(6,1,i+1)
ax=recent_grads[col].hist(color='orange')
plt.show()
data:image/s3,"s3://crabby-images/5bd62/5bd621cec91ae043b05a355733125684d32e4f96" alt=""
为了更方便的查看就业人数与薪资的关系,使用scatter_matrix函数来构建散点图矩阵:
from pandas.tools.plotting import scatter_matrix
scatter_matrix(recent_grads[['Employed','Median']],figsize=(10,10),c=['red','blue'])
data:image/s3,"s3://crabby-images/dc471/dc47135640da6c441b4831a50212c3dd79e18d52" alt=""
关于该矩阵的说明:
data:image/s3,"s3://crabby-images/792c9/792c9775f3f0adc2602c8f499921a581e411136d" alt=""
接下来不妨做些有意思的事情,分析一下薪资前10以及后10的专业中女生所占比例:
recent_grads[:10].plot.bar(x='Major',y='ShareWomen')
plt.legend(loc='upper left')
plt.title('The 10 highest paying majors.')
recent_grads[162:].plot(x='Major',y='ShareWomen',kind='bar')
plt.title('The 10 lowest paying majors.')
data:image/s3,"s3://crabby-images/18884/18884b9c61a4f41f7f63b7b7922850e9be21c451" alt=""
data:image/s3,"s3://crabby-images/0e7a3/0e7a3f6702a8f7a2c7a6bf0b9e790c23d1c8f50e" alt=""
分析薪资较高的专业中的男女性别比例:
recent_grads[:10].plot.bar(x='Major',y=['Men','Women'])
data:image/s3,"s3://crabby-images/abb41/abb416aeebe8b0a0b4b6f4c2a9410b81c21b8981" alt=""
网友评论