美文网首页
pandas练习-apply函数

pandas练习-apply函数

作者: 酸甜柠檬26 | 来源:发表于2019-12-17 20:31 被阅读0次

    步骤1 读取数据,并将数据命名为student

    >>> import pandas as pd,numpy as np
    >>> student = pd.read_csv('D:/Data/PythonPractise/pandasdata/student-mat.csv')
    >>> student.head()
      school sex  age address famsize Pstatus  ...  Walc  health absences  G1  G2  G3
    0     GP   F   18       U     GT3       A  ...     1       3        6   5   6   6
    1     GP   F   17       U     GT3       T  ...     1       3        4   5   5   6
    2     GP   F   15       U     LE3       T  ...     3       3       10   7   8  10
    3     GP   F   15       U     GT3       T  ...     1       5        2  15  14  15
    4     GP   F   16       U     GT3       T  ...     2       5        4   6  10  10
    
    [5 rows x 33 columns]
    

    步骤2 从'school'到'guardian'将数据切片

    >>> stud_alcoh = student.loc[:,'school':'guardian']
    >>> stud_alcoh.head()
      school sex  age address famsize  ... Fedu     Mjob      Fjob  reason guardian
    0     GP   F   18       U     GT3  ...    4  at_home   teacher  course   mother
    1     GP   F   17       U     GT3  ...    1  at_home     other  course   father
    2     GP   F   15       U     LE3  ...    1  at_home     other   other   mother
    3     GP   F   15       U     GT3  ...    2   health  services    home   mother
    4     GP   F   16       U     GT3  ...    3    other     other    home   father
    

    步骤3 创建一个捕获字符串的lambda函数

    >>> captalizer = lambda x: x.upper()
    

    步骤4 使'Fjob'列都大写

    >>> stud_alcoh['Fjob'].apply(captalizer)
    0       TEACHER
    1         OTHER
    2         OTHER
    3      SERVICES
    4         OTHER
    ~~~
    

    步骤5 打印数据集的最后几行元素

    >>> stud_alcoh.tail()
        school sex  age address famsize  ... Fedu      Mjob      Fjob  reason guardian
    390     MS   M   20       U     LE3  ...    2  services  services  course    other
    391     MS   M   17       U     LE3  ...    1  services  services  course   mother
    392     MS   M   21       R     GT3  ...    1     other     other  course    other
    393     MS   M   18       R     LE3  ...    2  services     other  course   mother
    394     MS   M   19       U     LE3  ...    1     other   at_home  course   father
    
    [5 rows x 12 columns]
    

    步骤6 注意到原始数据框仍然是小写字母,接下来改进一下

    stud_alcoh['Fjob'] = stud_alcoh['Fjob'].apply(captalizer)
    >>> stud_alcoh.tail()
        school sex  age address famsize  ... Fedu      Mjob      Fjob  reason guardian
    390     MS   M   20       U     LE3  ...    2  services  SERVICES  course    other
    391     MS   M   17       U     LE3  ...    1  services  SERVICES  course   mother
    392     MS   M   21       R     GT3  ...    1     other     OTHER  course    other
    393     MS   M   18       R     LE3  ...    2  services     OTHER  course   mother
    394     MS   M   19       U     LE3  ...    1     other   AT_HOME  course   father
    

    步骤7 创建一个名为majority的函数,它返回一个布尔值到一个名为legal_drinker的新列(多数年龄大于17岁)

    >>> def majority(x):
        if x > 17:
            return True
        else:
            return False
    
        
    >>> stud_alcoh['legal_drinker'] = stud_alcoh['age'].apply(majority)
    >>> stud_alcoh.head()
      school sex  age address  ...      Fjob      reason  guardian  legal_drinker
    0     GP   F   18       U  ...   TEACHER      course    mother           True
    1     GP   F   17       U  ...     OTHER      course    father          False
    2     GP   F   15       U  ...     OTHER       other    mother          False
    3     GP   F   15       U  ...  SERVICES        home    mother          False
    4     GP   F   16       U  ...     OTHER        home    father          False
    
    [5 rows x 13 columns]
    

    步骤8 将数据集的每个数字乘以10

    >>> def times10(x):
        if type(x) is int:
            return 10 * x
        return x
    
    >>> stud_alcoh.applymap(times10).head(10)
      school sex  age address  ...      Fjob      reason  guardian  legal_drinker
    0     GP   F  180       U  ...   TEACHER      course    mother           True
    1     GP   F  170       U  ...     OTHER      course    father          False
    2     GP   F  150       U  ...     OTHER       other    mother          False
    3     GP   F  150       U  ...  SERVICES        home    mother          False
    4     GP   F  160       U  ...     OTHER        home    father          False
    5     GP   M  160       U  ...     OTHER  reputation    mother          False
    6     GP   M  160       U  ...     OTHER        home    mother          False
    7     GP   F  170       U  ...   TEACHER        home    mother          False
    8     GP   M  150       U  ...     OTHER        home    mother          False
    9     GP   M  150       U  ...     OTHER        home    mother          False
    
    [10 rows x 13 columns]
    

    拓展:

    apply/applymap/map的区别
    1、当我们要对数据框(DataFrame)的数据进行按行或按列操作时用apply()
    2、当我们要对数据框(DataFrame)的每一个数据进行操作时用applymap(),返回结果是DataFrame格式
    3、当我们要对Series的每一个数据进行操作时用map()
    在上面的步骤4中,是的Fjob列的每个字母都大写,同样可以这样操作

    >>> stud_alcoh.Fjob.map(captalizer)
    

    练习:将DataFrame中的某一列的数据类型str转化成int

    >>> data1 = pd.DataFrame({'a':['1','2','3'],'b':['4','5','6']})
    >>> data1
       a  b
    0  1  4
    1  2  5
    2  3  6
    >>> data1.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 2 columns):
    a    3 non-null object
    b    3 non-null object
    dtypes: object(2)
    memory usage: 128.0+ bytes
    
    >>> data1 = data1.applymap(int)
    >>> data1.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 2 columns):
    a    3 non-null int64
    b    3 non-null int64
    dtypes: int64(2)
    memory usage: 128.0 bytes
    

    相关文章

      网友评论

          本文标题:pandas练习-apply函数

          本文链接:https://www.haomeiwen.com/subject/qaedxctx.html