美文网首页Pandas技巧
Python_Pandas_Select_Data_loc[ ]

Python_Pandas_Select_Data_loc[ ]

作者: Kaspar433 | 来源:发表于2020-03-28 22:27 被阅读0次

    .loc[]

    .loc主要是基于标签的,但也可以与布尔数组一起使用。

    可以输入如下几种类型:

    • 单个标签,例如5或'a';
    • 列表或标签数组。['a', 'b', 'c']
    • 带标签的切片对象'a':'f';
    • 布尔数组
    • 函数。
    import pandas as pd
    import numpy as np
    import seaborn as sns
    ​
    iris = pd.read_csv('iris.csv',header=0).sample(10)
    iris
    
    out:
        sepal_length    sepal_width petal_length    petal_width species
    11  4.8 3.4 1.6 0.2 setosa
    106 4.9 2.5 4.5 1.7 virginica
    14  5.8 4.0 1.2 0.2 setosa
    61  5.9 3.0 4.2 1.5 versicolor
    138 6.0 3.0 4.8 1.8 virginica
    132 6.4 2.8 5.6 2.2 virginica
    97  6.2 2.9 4.3 1.3 versicolor
    119 6.0 2.2 5.0 1.5 virginica
    31  5.4 3.4 1.5 0.4 setosa
    19  5.1 3.8 1.5 0.3 setosa
    
    iris.index = list('abcdefghij')
    iris
    
    out:
        sepal_length    sepal_width petal_length    petal_width species
    a   5.6 2.5 3.9 1.1 versicolor
    b   6.0 3.0 4.8 1.8 virginica
    c   7.2 3.6 6.1 2.5 virginica
    d   5.4 3.7 1.5 0.2 setosa
    e   6.6 3.0 4.4 1.4 versicolor
    f   6.4 2.8 5.6 2.1 virginica
    g   4.8 3.4 1.9 0.2 setosa
    h   5.7 2.9 4.2 1.3 versicolor
    i   6.1 3.0 4.9 1.8 virginica
    j   6.5 3.2 5.1 2.0 virginica
    

    Series

    species = iris.species.copy()
    species.loc['b']
    
    out:
    'virginica'
    
    species.loc['c':'e']
    
    out:
    c     virginica
    d        setosa
    e    versicolor
    Name: species, dtype: object
    
    species.loc['h':]
    h    versicolor
    i     virginica
    j     virginica
    Name: species, dtype: object
    

    DataFrame

    直接通过标签访问

    
    iris.loc[['a','c','d'], :]
    
    out:
    sepal_length    sepal_width petal_length    petal_width species
    a   5.6 2.5 3.9 1.1 versicolor
    c   7.2 3.6 6.1 2.5 virginica
    d   5.4 3.7 1.5 0.2 setosa
    

    通过标签切片访问

    iris.loc['b':'f', 'sepal_length':'petal_length']
    
    out:
    sepal_length    sepal_width petal_length
    b   6.0 3.0 4.8
    c   7.2 3.6 6.1
    d   5.4 3.7 1.5
    e   6.6 3.0 4.4
    f   6.4 2.8 5.6
    

    使用单个标签

    iris.loc['d']
    
    out:
    sepal_length       5.4
    sepal_width        3.7
    petal_length       1.5
    petal_width        0.2
    species         setosa
    Name: d, dtype: object
    

    使用布尔数组

    iris.loc[iris.sepal_length > iris.sepal_length.mean()]
    
    out:
    sepal_length    sepal_width petal_length    petal_width species
    c   7.2 3.6 6.1 2.5 virginica
    e   6.6 3.0 4.4 1.4 versicolor
    f   6.4 2.8 5.6 2.1 virginica
    i   6.1 3.0 4.9 1.8 virginica
    j   6.5 3.2 5.1 2.0 virginica
    
    iris.index = np.random.randint(0,10,10)
    iris
    
    out:
    sepal_length    sepal_width petal_length    petal_width species
    8   5.6 2.5 3.9 1.1 versicolor
    5   6.0 3.0 4.8 1.8 virginica
    9   7.2 3.6 6.1 2.5 virginica
    4   5.4 3.7 1.5 0.2 setosa
    2   6.6 3.0 4.4 1.4 versicolor
    0   6.4 2.8 5.6 2.1 virginica
    3   4.8 3.4 1.9 0.2 setosa
    7   5.7 2.9 4.2 1.3 versicolor
    3   6.1 3.0 4.9 1.8 virginica
    5   6.5 3.2 5.1 2.0 virginica
    

    使用.loc切片时,如果索引中存在开始和停止标签,则返回位于两者之间的元素(包括它们):

    iris.loc[9:2]
    sepal_length    sepal_width petal_length    petal_width species
    9   7.2 3.6 6.1 2.5 virginica
    4   5.4 3.7 1.5 0.2 setosa
    2   6.6 3.0 4.4 1.4 versicolor
    

    如果两个中至少有一个不存在,但索引已排序,并且可以与开始和停止标签进行比较,那么通过选择在两者之间排名的标签,切片仍将按预期工作:

    iris.sort_index()
    sepal_length    sepal_width petal_length    petal_width species
    0   6.4 2.8 5.6 2.1 virginica
    2   6.6 3.0 4.4 1.4 versicolor
    3   4.8 3.4 1.9 0.2 setosa
    3   6.1 3.0 4.9 1.8 virginica
    4   5.4 3.7 1.5 0.2 setosa
    5   6.0 3.0 4.8 1.8 virginica
    5   6.5 3.2 5.1 2.0 virginica
    7   5.7 2.9 4.2 1.3 versicolor
    8   5.6 2.5 3.9 1.1 versicolor
    9   7.2 3.6 6.1 2.5 virginica
    
    iris.sort_index().loc[3:7]
    
    out:
    sepal_length    sepal_width petal_length    petal_width species
    3   4.8 3.4 1.9 0.2 setosa
    3   6.1 3.0 4.9 1.8 virginica
    4   5.4 3.7 1.5 0.2 setosa
    5   6.0 3.0 4.8 1.8 virginica
    5   6.5 3.2 5.1 2.0 virginica
    7   5.7 2.9 4.2 1.3 versicolor
    

    使用可调用函数进行选择

    df = pd.DataFrame(np.random.randn(6,4), index=list('abcdef'), columns=list('ABCD'))
    df
    
    out:
        A   B   C   D
    a   0.737161    -0.514738   -1.457052   0.353337
    b   0.801916    0.266375    -0.968714   -0.087611
    c   -0.799433   -1.250238   -0.598625   1.259859
    d   -0.780325   1.910598    -0.522512   -0.680966
    e   -1.167703   -0.234484   0.243291    -1.931064
    f   -0.147435   0.145292    -0.256636   -0.110757
    
    df.loc[lambda df: df.index > 'c']
    out:
        A   B   C   D
    d   -0.780325   1.910598    -0.522512   -0.680966
    e   -1.167703   -0.234484   0.243291    -1.931064
    f   -0.147435   0.145292    -0.256636   -0.110757
    
    df.loc[lambda df: df.A<0]
    
    out:
        A   B   C   D
    c   -0.799433   -1.250238   -0.598625   1.259859
    d   -0.780325   1.910598    -0.522512   -0.680966
    e   -1.167703   -0.234484   0.243291    -1.931064
    f   -0.147435   0.145292    -0.256636   -0.110757
    
    df.loc[lambda df: df.A<0, lambda df: ['A', 'B']]
    
    out:
        A   B
    c   -0.799433   -1.250238
    d   -0.780325   1.910598
    e   -1.167703   -0.234484
    f   -0.147435   0.145292
    ​
    

    相关文章

      网友评论

        本文标题:Python_Pandas_Select_Data_loc[ ]

        本文链接:https://www.haomeiwen.com/subject/owjeuhtx.html