pandas练习-apply函数

作者: 酸甜柠檬26 | 来源:发表于2019-12-17 20:31 被阅读0次

pandas练习-apply函数
19 Pandas怎样对每个分组应用apply函数
18 Pandas的数据转换函数map、apply、applym
pandas中的apply函数
Pandas使用笔记
pandas apply函数
尝尝pandas(4)
transform 函数 agg 函数和apply函数的用法辨析
pandas基础教程——Day4
pandas apply() 函数用法

步骤1 读取数据，并将数据命名为student

>>> import pandas as pd,numpy as np
>>> student = pd.read_csv('D:/Data/PythonPractise/pandasdata/student-mat.csv')
>>> student.head()
  school sex  age address famsize Pstatus  ...  Walc  health absences  G1  G2  G3
0     GP   F   18       U     GT3       A  ...     1       3        6   5   6   6
1     GP   F   17       U     GT3       T  ...     1       3        4   5   5   6
2     GP   F   15       U     LE3       T  ...     3       3       10   7   8  10
3     GP   F   15       U     GT3       T  ...     1       5        2  15  14  15
4     GP   F   16       U     GT3       T  ...     2       5        4   6  10  10

[5 rows x 33 columns]

步骤2 从'school'到'guardian'将数据切片

>>> stud_alcoh = student.loc[:,'school':'guardian']
>>> stud_alcoh.head()
  school sex  age address famsize  ... Fedu     Mjob      Fjob  reason guardian
0     GP   F   18       U     GT3  ...    4  at_home   teacher  course   mother
1     GP   F   17       U     GT3  ...    1  at_home     other  course   father
2     GP   F   15       U     LE3  ...    1  at_home     other   other   mother
3     GP   F   15       U     GT3  ...    2   health  services    home   mother
4     GP   F   16       U     GT3  ...    3    other     other    home   father

步骤3 创建一个捕获字符串的lambda函数

>>> captalizer = lambda x: x.upper()

步骤4 使'Fjob'列都大写

>>> stud_alcoh['Fjob'].apply(captalizer)
0       TEACHER
1         OTHER
2         OTHER
3      SERVICES
4         OTHER
~~~

步骤5 打印数据集的最后几行元素

>>> stud_alcoh.tail()
    school sex  age address famsize  ... Fedu      Mjob      Fjob  reason guardian
390     MS   M   20       U     LE3  ...    2  services  services  course    other
391     MS   M   17       U     LE3  ...    1  services  services  course   mother
392     MS   M   21       R     GT3  ...    1     other     other  course    other
393     MS   M   18       R     LE3  ...    2  services     other  course   mother
394     MS   M   19       U     LE3  ...    1     other   at_home  course   father

[5 rows x 12 columns]

步骤6 注意到原始数据框仍然是小写字母，接下来改进一下

stud_alcoh['Fjob'] = stud_alcoh['Fjob'].apply(captalizer)
>>> stud_alcoh.tail()
    school sex  age address famsize  ... Fedu      Mjob      Fjob  reason guardian
390     MS   M   20       U     LE3  ...    2  services  SERVICES  course    other
391     MS   M   17       U     LE3  ...    1  services  SERVICES  course   mother
392     MS   M   21       R     GT3  ...    1     other     OTHER  course    other
393     MS   M   18       R     LE3  ...    2  services     OTHER  course   mother
394     MS   M   19       U     LE3  ...    1     other   AT_HOME  course   father

步骤7 创建一个名为majority的函数，它返回一个布尔值到一个名为legal_drinker的新列（多数年龄大于17岁）

>>> def majority(x):
    if x > 17:
        return True
    else:
        return False

    
>>> stud_alcoh['legal_drinker'] = stud_alcoh['age'].apply(majority)
>>> stud_alcoh.head()
  school sex  age address  ...      Fjob      reason  guardian  legal_drinker
0     GP   F   18       U  ...   TEACHER      course    mother           True
1     GP   F   17       U  ...     OTHER      course    father          False
2     GP   F   15       U  ...     OTHER       other    mother          False
3     GP   F   15       U  ...  SERVICES        home    mother          False
4     GP   F   16       U  ...     OTHER        home    father          False

[5 rows x 13 columns]

步骤8 将数据集的每个数字乘以10

>>> def times10(x):
    if type(x) is int:
        return 10 * x
    return x

>>> stud_alcoh.applymap(times10).head(10)
  school sex  age address  ...      Fjob      reason  guardian  legal_drinker
0     GP   F  180       U  ...   TEACHER      course    mother           True
1     GP   F  170       U  ...     OTHER      course    father          False
2     GP   F  150       U  ...     OTHER       other    mother          False
3     GP   F  150       U  ...  SERVICES        home    mother          False
4     GP   F  160       U  ...     OTHER        home    father          False
5     GP   M  160       U  ...     OTHER  reputation    mother          False
6     GP   M  160       U  ...     OTHER        home    mother          False
7     GP   F  170       U  ...   TEACHER        home    mother          False
8     GP   M  150       U  ...     OTHER        home    mother          False
9     GP   M  150       U  ...     OTHER        home    mother          False

[10 rows x 13 columns]

拓展：

apply/applymap/map的区别
1、当我们要对数据框（DataFrame）的数据进行按行或按列操作时用apply()
2、当我们要对数据框（DataFrame）的每一个数据进行操作时用applymap()，返回结果是DataFrame格式
3、当我们要对Series的每一个数据进行操作时用map()
在上面的步骤4中，是的Fjob列的每个字母都大写，同样可以这样操作

>>> stud_alcoh.Fjob.map(captalizer)

练习：将DataFrame中的某一列的数据类型str转化成int

>>> data1 = pd.DataFrame({'a':['1','2','3'],'b':['4','5','6']})
>>> data1
   a  b
0  1  4
1  2  5
2  3  6
>>> data1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a    3 non-null object
b    3 non-null object
dtypes: object(2)
memory usage: 128.0+ bytes

>>> data1 = data1.applymap(int)
>>> data1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a    3 non-null int64
b    3 non-null int64
dtypes: int64(2)
memory usage: 128.0 bytes