pandas通常具有“索引”,即用一列每一行提供名称。 它像数据库表中的主键一样工作。 Pandas还支持MultiIndex,其中行的索引是几列的复合键。
从CSV文件创建未索引的DataFrame
>>> import pandas, io
>>> data = io.StringIO('''Fruit,Color,Count,Price
... Apple,Red,3,$1.29
... Apple,Green,9,$0.99
... Pear,Red,25,$2.59
... Pear,Green,26,$2.79
... Lime,Green,99,$0.39
... ''')
>>> df_unindexed = pandas.read_csv(data)
>>> df_unindexed
Fruit Color Count Price
0 Apple Red 3 $1.29
1 Apple Green 9 $0.99
2 Pear Red 25 $2.59
3 Pear Green 26 $2.79
4 Lime Green 99 $0.39
>>> df = df_unindexed.set_index(['Fruit', 'Color'])
>>> df
Count Price
Fruit Color
Apple Red 3 $1.29
Green 9 $0.99
Pear Red 25 $2.59
Green 26 $2.79
Lime Green 99 $0.39
>>>
>>>
>>> df.xs('Apple')
Count Price
Color
Red 3 $1.29
Green 9 $0.99
>>>
>>> df.xs('Red', level='Color')
Count Price
Fruit
Apple 3 $1.29
Pear 25 $2.59
>>> df.loc['Apple', :]
Count Price
Color
Red 3 $1.29
Green 9 $0.99
>>>
>>>
>>> df.loc[('Apple', 'Red'), :]
Count 3
Price $1.29
Name: (Apple, Red), dtype: object
>>>
https://www.somebits.com/~nelson/pandas-multiindex-slice-demo.html
pandas.DataFrame.xs
此方法采用关键参数来选择MultiIndex特定级别的数据,实际上也适用于单列索引,用于通过索引的方式访问行,和loc类似。
>>> d = {'num_legs': [4, 4, 2, 2],
... 'num_wings': [0, 0, 2, 2],
... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
... 'animal': ['cat', 'dog', 'bat', 'penguin'],
... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df
num_legs num_wings class animal locomotion
0 4 0 mammal cat walks
1 4 0 mammal dog walks
2 2 2 mammal bat flies
3 2 2 bird penguin walks
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
>>> df.xs('mammal')
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
>>> df.xs(('mammal', 'dog'))
sys:1: PerformanceWarning: indexing past lexsort depth may impact performance.
num_legs num_wings
locomotion
walks 4 0
>>> df.xs('cat', level=1)
num_legs num_wings
class locomotion
mammal walks 4 0
>>> df.xs(('bird', 'walks'),level=[0, 'locomotion'])
num_legs num_wings
animal
penguin 2 2
>>> df.xs('num_wings', axis=1)
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
网友评论