步骤1 读取数据,并将数据命名为student
>>> import pandas as pd,numpy as np
>>> student = pd.read_csv('D:/Data/PythonPractise/pandasdata/student-mat.csv')
>>> student.head()
school sex age address famsize Pstatus ... Walc health absences G1 G2 G3
0 GP F 18 U GT3 A ... 1 3 6 5 6 6
1 GP F 17 U GT3 T ... 1 3 4 5 5 6
2 GP F 15 U LE3 T ... 3 3 10 7 8 10
3 GP F 15 U GT3 T ... 1 5 2 15 14 15
4 GP F 16 U GT3 T ... 2 5 4 6 10 10
[5 rows x 33 columns]
步骤2 从'school'到'guardian'将数据切片
>>> stud_alcoh = student.loc[:,'school':'guardian']
>>> stud_alcoh.head()
school sex age address famsize ... Fedu Mjob Fjob reason guardian
0 GP F 18 U GT3 ... 4 at_home teacher course mother
1 GP F 17 U GT3 ... 1 at_home other course father
2 GP F 15 U LE3 ... 1 at_home other other mother
3 GP F 15 U GT3 ... 2 health services home mother
4 GP F 16 U GT3 ... 3 other other home father
步骤3 创建一个捕获字符串的lambda函数
>>> captalizer = lambda x: x.upper()
步骤4 使'Fjob'列都大写
>>> stud_alcoh['Fjob'].apply(captalizer)
0 TEACHER
1 OTHER
2 OTHER
3 SERVICES
4 OTHER
~~~
步骤5 打印数据集的最后几行元素
>>> stud_alcoh.tail()
school sex age address famsize ... Fedu Mjob Fjob reason guardian
390 MS M 20 U LE3 ... 2 services services course other
391 MS M 17 U LE3 ... 1 services services course mother
392 MS M 21 R GT3 ... 1 other other course other
393 MS M 18 R LE3 ... 2 services other course mother
394 MS M 19 U LE3 ... 1 other at_home course father
[5 rows x 12 columns]
步骤6 注意到原始数据框仍然是小写字母,接下来改进一下
stud_alcoh['Fjob'] = stud_alcoh['Fjob'].apply(captalizer)
>>> stud_alcoh.tail()
school sex age address famsize ... Fedu Mjob Fjob reason guardian
390 MS M 20 U LE3 ... 2 services SERVICES course other
391 MS M 17 U LE3 ... 1 services SERVICES course mother
392 MS M 21 R GT3 ... 1 other OTHER course other
393 MS M 18 R LE3 ... 2 services OTHER course mother
394 MS M 19 U LE3 ... 1 other AT_HOME course father
步骤7 创建一个名为majority的函数,它返回一个布尔值到一个名为legal_drinker的新列(多数年龄大于17岁)
>>> def majority(x):
if x > 17:
return True
else:
return False
>>> stud_alcoh['legal_drinker'] = stud_alcoh['age'].apply(majority)
>>> stud_alcoh.head()
school sex age address ... Fjob reason guardian legal_drinker
0 GP F 18 U ... TEACHER course mother True
1 GP F 17 U ... OTHER course father False
2 GP F 15 U ... OTHER other mother False
3 GP F 15 U ... SERVICES home mother False
4 GP F 16 U ... OTHER home father False
[5 rows x 13 columns]
步骤8 将数据集的每个数字乘以10
>>> def times10(x):
if type(x) is int:
return 10 * x
return x
>>> stud_alcoh.applymap(times10).head(10)
school sex age address ... Fjob reason guardian legal_drinker
0 GP F 180 U ... TEACHER course mother True
1 GP F 170 U ... OTHER course father False
2 GP F 150 U ... OTHER other mother False
3 GP F 150 U ... SERVICES home mother False
4 GP F 160 U ... OTHER home father False
5 GP M 160 U ... OTHER reputation mother False
6 GP M 160 U ... OTHER home mother False
7 GP F 170 U ... TEACHER home mother False
8 GP M 150 U ... OTHER home mother False
9 GP M 150 U ... OTHER home mother False
[10 rows x 13 columns]
拓展:
apply/applymap/map的区别
1、当我们要对数据框(DataFrame)的数据进行按行或按列操作时用apply()
2、当我们要对数据框(DataFrame)的每一个数据进行操作时用applymap(),返回结果是DataFrame格式
3、当我们要对Series的每一个数据进行操作时用map()
在上面的步骤4中,是的Fjob列的每个字母都大写,同样可以这样操作
>>> stud_alcoh.Fjob.map(captalizer)
练习:将DataFrame中的某一列的数据类型str转化成int
>>> data1 = pd.DataFrame({'a':['1','2','3'],'b':['4','5','6']})
>>> data1
a b
0 1 4
1 2 5
2 3 6
>>> data1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a 3 non-null object
b 3 non-null object
dtypes: object(2)
memory usage: 128.0+ bytes
>>> data1 = data1.applymap(int)
>>> data1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a 3 non-null int64
b 3 non-null int64
dtypes: int64(2)
memory usage: 128.0 bytes
网友评论