5.2.3 索引、选择与过滤
Series
Series的索引obj[...]
与Numpy数组索引的功能相似,只不过Series的索引值可以不仅仅是整数,相关的示例如下:
In [53]: obj=pd.Series(np.arange(4.),index=['a','b','c','d'])
In [54]: obj
Out[54]:
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
In [55]: obj['b']
Out[55]: 1.0
In [56]: obj[1]
Out[56]: 1.0
In [57]: obj[2:4]
Out[57]:
c 2.0
d 3.0
dtype: float64
In [58]: obj[['b','a','d']]
Out[58]:
b 1.0
a 0.0
d 3.0
dtype: float64
In [59]: obj[[1,3]]
Out[59]:
b 1.0
d 3.0
dtype: float64
In [60]: obj[obj<2]
Out[60]:
a 0.0
b 1.0
dtype: float64
普通的Python切片是不包含尾部的,Series的切片与之不同:
obj['b':'c']
Out[61]:
b 1.0
c 2.0
dtype: float64
使用这些方法可以修改Series相应的部分
obj['b':'c']=5
obj
Out[63]:
a 0.0
b 5.0
c 5.0
d 3.0
dtype: float64
DataFrame
使用单个值或者序列,可以从DataFrame中索引出一个或多个列:
data=pd.DataFrame(np.arange(16).reshape((4,4)),
index=['Ohio','Colorado','Utah','New York'],
columns=['one','two','three','four'])
data
Out[71]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
data['two']
Out[72]:
Ohio 1
Colorado 5
Utah 9
New York 13
Name: two, dtype: int64
data[['three','one']]
Out[73]:
three one
Ohio 2 0
Colorado 6 4
Utah 10 8
New York 14 12
行选择语法data[:2]
非常方便,传递耽搁元素或一个列表到[]
符号中可以选择列。
另一个例子是用布尔值DataFrame进行索引,布尔值DataFrame可以是对标量值进行比较产生的。
data[:2]
Out[75]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
data[data['three']>5]
Out[76]:
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
data<5
Out[77]:
one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
data[data<5]=0
data
data
Out[79]:
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
网友评论