美文网首页
pandas 给dataframe 赋值操作1

pandas 给dataframe 赋值操作1

作者: 筝韵徽 | 来源:发表于2019-01-12 04:37 被阅读345次
    import pandas as pd
    import numpy as np
    from tabulate import tabulate
    

    pandas 给dataframe 赋值操作1

    df = pd.read_csv('data/employee_sample.csv',index_col=0)
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY
    --------  --------  ------  ------------------  ------------------  --------
    Tom       Male      White   Engineering                         23    107962
    Niko      Male      Black   Engineering                          1     30347
    Penelope  Female    White   Engineering                         12     60258
    Aria      Female    Black   Engineering                          8     43618
    Sofia     Female    Black   Parks & Recreation                  23     26125
    Dean      Male      Black   Parks & Recreation                   3     33592
    Zach      Male      White   Parks & Recreation                   4     37565
    
    • 给dataframe 加1列 如下
    1. 给df增加score列并使用单个值给该列赋值
    df['score'] =100
    
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY    score
    --------  --------  ------  ------------------  ------------------  --------  -------
    Tom       Male      White   Engineering                         23    107962      100
    Niko      Male      Black   Engineering                          1     30347      100
    Penelope  Female    White   Engineering                         12     60258      100
    Aria      Female    Black   Engineering                          8     43618      100
    Sofia     Female    Black   Parks & Recreation                  23     26125      100
    Dean      Male      Black   Parks & Recreation                   3     33592      100
    Zach      Male      White   Parks & Recreation                   4     37565      100
    
    1. 使用np.array给df增加rate列并赋值
    rates = np.round(np.random.rand(7),2)
    rates
    
    array([0.69, 0.16, 0.9 , 0.21, 0.25, 0.5 , 0.09])
    
    df['rate']=rates
    
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY    score    rate
    --------  --------  ------  ------------------  ------------------  --------  -------  ------
    Tom       Male      White   Engineering                         23    107962      100    0.69
    Niko      Male      Black   Engineering                          1     30347      100    0.16
    Penelope  Female    White   Engineering                         12     60258      100    0.9
    Aria      Female    Black   Engineering                          8     43618      100    0.21
    Sofia     Female    Black   Parks & Recreation                  23     26125      100    0.25
    Dean      Male      Black   Parks & Recreation                   3     33592      100    0.5
    Zach      Male      White   Parks & Recreation                   4     37565      100    0.09
    
    1. 同上
    score = np.random.randint(0,100,len(df))
    
    score
    
    array([60, 20, 84, 34, 71,  0,  0])
    
    df['score']=score
    
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY    score    rate
    --------  --------  ------  ------------------  ------------------  --------  -------  ------
    Tom       Male      White   Engineering                         23    107962       60    0.69
    Niko      Male      Black   Engineering                          1     30347       20    0.16
    Penelope  Female    White   Engineering                         12     60258       84    0.9
    Aria      Female    Black   Engineering                          8     43618       34    0.21
    Sofia     Female    Black   Parks & Recreation                  23     26125       71    0.25
    Dean      Male      Black   Parks & Recreation                   3     33592        0    0.5
    Zach      Male      White   Parks & Recreation                   4     37565        0    0.09
    
    1. 使用list列表给df增加floor列,并且赋值
    floor=[10,2,3,4,9,2,4]
    df['floor']=floor
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY    score    rate    floor
    --------  --------  ------  ------------------  ------------------  --------  -------  ------  -------
    Tom       Male      White   Engineering                         23    107962       60    0.69       10
    Niko      Male      Black   Engineering                          1     30347       20    0.16        2
    Penelope  Female    White   Engineering                         12     60258       84    0.9         3
    Aria      Female    Black   Engineering                          8     43618       34    0.21        4
    Sofia     Female    Black   Parks & Recreation                  23     26125       71    0.25        9
    Dean      Male      Black   Parks & Recreation                   3     33592        0    0.5         2
    Zach      Male      White   Parks & Recreation                   4     37565        0    0.09        4
    
    1. 使用Series给df增加lastname列,并且赋值
    last_name = pd.Series(['Smith', 'Jones', 'Williams', 'Green', 'Brown', 'Simpson', 'Peters'])
    last_name
    
    0       Smith
    1       Jones
    2    Williams
    3       Green
    4       Brown
    5     Simpson
    6      Peters
    dtype: object
    
    df['lastname']=last_name
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY    score    rate    floor    lastname
    --------  --------  ------  ------------------  ------------------  --------  -------  ------  -------  ----------
    Tom       Male      White   Engineering                         23    107962       60    0.69       10         nan
    Niko      Male      Black   Engineering                          1     30347       20    0.16        2         nan
    Penelope  Female    White   Engineering                         12     60258       84    0.9         3         nan
    Aria      Female    Black   Engineering                          8     43618       34    0.21        4         nan
    Sofia     Female    Black   Parks & Recreation                  23     26125       71    0.25        9         nan
    Dean      Male      Black   Parks & Recreation                   3     33592        0    0.5         2         nan
    Zach      Male      White   Parks & Recreation                   4     37565        0    0.09        4         nan
    

    为啥上个例子lastname的值都是Nan?原因是index不匹配

    last_name.index
    
    RangeIndex(start=0, stop=7, step=1)
    
    df.index
    
    Index(['Tom', 'Niko', 'Penelope', 'Aria', 'Sofia', 'Dean', 'Zach'], dtype='object')
    

    index一个是数值,一个是字符串,现在搞成一样的

    last_name=pd.Series(last_name.values,index=df.index)
    
    last_name
    
    Tom            Smith
    Niko           Jones
    Penelope    Williams
    Aria           Green
    Sofia          Brown
    Dean         Simpson
    Zach          Peters
    dtype: object
    
    df['lastname']=last_name
    print(tabulate(df,headers=df.columns,tablefmt='simple'))
    
              GENDER    RACE    DEPARTMENT            YEARS EXPERIENCE    SALARY    score    rate    floor  lastname
    --------  --------  ------  ------------------  ------------------  --------  -------  ------  -------  ----------
    Tom       Male      White   Engineering                         23    107962       60    0.69       10  Smith
    Niko      Male      Black   Engineering                          1     30347       20    0.16        2  Jones
    Penelope  Female    White   Engineering                         12     60258       84    0.9         3  Williams
    Aria      Female    Black   Engineering                          8     43618       34    0.21        4  Green
    Sofia     Female    Black   Parks & Recreation                  23     26125       71    0.25        9  Brown
    Dean      Male      Black   Parks & Recreation                   3     33592        0    0.5         2  Simpson
    Zach      Male      White   Parks & Recreation                   4     37565        0    0.09        4  Peters
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    相关文章

      网友评论

          本文标题:pandas 给dataframe 赋值操作1

          本文链接:https://www.haomeiwen.com/subject/gabfdqtx.html