美文网首页
Pandas - 2. 抽取行列

Pandas - 2. 抽取行列

作者: 陈天睡懒觉 | 来源:发表于2022-04-30 16:30 被阅读0次
    import pandas as pd
    df = pd.read_csv('data/gapminder.tsv',sep='\t')
    print(df.head())
    
           country continent  year  lifeExp       pop   gdpPercap
    0  Afghanistan      Asia  1952   28.801   8425333  779.445314
    1  Afghanistan      Asia  1957   30.332   9240934  820.853030
    2  Afghanistan      Asia  1962   31.997  10267083  853.100710
    3  Afghanistan      Asia  1967   34.020  11537966  836.197138
    4  Afghanistan      Asia  1972   36.088  13079460  739.981106
    

    查看每一列的类型 df.dtypes或df.info()

    • object -- string -- 字符串
    • int64 -- int -- 整型
    • float64 -- float -- 浮点型
    • datetime64 -- datetime -- 时间
    print(df.dtypes)
    
    country       object
    continent     object
    year           int64
    lifeExp      float64
    pop            int64
    gdpPercap    float64
    dtype: object
    

    查看行列信息

    # df.shape shape是属性,加上括号会报错
    print(df.shape) #(行数,列数)
    
    (1704, 6)
    

    获取列名和行索引

    # df.columns (列名)
    print(df.columns)
    # df.index (行索引)
    print(df.index)
    print(list(df.index)[:10])
    
    Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
    RangeIndex(start=0, stop=1704, step=1)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    

    获取列子集

    # 单列
    continent = df.continent #只适合英文列名
    continent = df['continent']
    print(continent[:5])
    # 多列
    year_continent = df[['year','continent']]
    print(year_continent[:5])
    
    0    Asia
    1    Asia
    2    Asia
    3    Asia
    4    Asia
    Name: continent, dtype: object
       year continent
    0  1952      Asia
    1  1957      Asia
    2  1962      Asia
    3  1967      Asia
    4  1972      Asia
    

    获取行子集

    • 通过行名(loc)
    • 用过行号(iloc)
    # 取一行
    sample = df.loc[0] # 因为只取1行输出Series
    print(sample)
    # 取多行
    samples = df.loc[[0,100,200]]
    print(samples)
    # df.loc[-1]会报错,因为没有-1这个标签的行
    
    # 取一行
    sample = df.iloc[0] # 因为只取1行输出Series
    
    # 取多行
    samples = df.iloc[[0,100,200]]
    
    # iloc可以输入数值
    sample = df.iloc[-1]
    
    country      Afghanistan
    continent           Asia
    year                1952
    lifeExp           28.801
    pop              8425333
    gdpPercap        779.445
    Name: 0, dtype: object
              country continent  year  lifeExp       pop   gdpPercap
    0     Afghanistan      Asia  1952   28.801   8425333  779.445314
    100    Bangladesh      Asia  1972   45.252  70759295  630.233627
    200  Burkina Faso    Africa  1992   50.260   8878303  931.752773
    

    混合,抽取行列子集

    iloc/loc[,] 逗号左边是行,右边是列

    # 获取整列
    subset = df.loc[:,['year','pop']]
    subset = df.iloc[:,[1,3,-1]] # 可以指定具体位置的列
    subset = df.iloc[:,3:6] 
    subset = df.iloc[:,:3] 
    
    # 多行多列
    subset = df.loc[[1,10,20],['year','pop']]
    subset = df.iloc[[1,10,20],[1,-1]]
    print(subset)
    
       continent    gdpPercap
    1       Asia   820.853030
    10      Asia   726.734055
    20    Europe  2497.437901
    

    相关文章

      网友评论

          本文标题:Pandas - 2. 抽取行列

          本文链接:https://www.haomeiwen.com/subject/clyyyrtx.html