Pandas重新索引 reindex
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
obj = Series([1.2,5.6,9.8,-1.5],index=['a','b','c','d'])
print(obj)
obj_1 = obj.reindex(['a','b','c','d','e','f'])
#如果索引的值多于原来的,则默认该索引的值为NaN
print(obj_1)
#如果不想为空值,可以设置成自己想设置的值
obj_2 = obj.reindex(['a','b','c','d','e','f'],fill_value = 'aaa')
print(obj_2)
out:
a 1.2
b 5.6
c 9.8
d -1.5
dtype: float64
a 1.2
b 5.6
c 9.8
d -1.5
e NaN
f NaN
dtype: float64
a 1.2
b 5.6
c 9.8
d -1.5
e aaa
f aaa
dtype: object
# reindex方法有很多参数
#下面看看向前后填充
obj = pd.Series([4.5,9.8,-1.2],index = [0,2,4])
obj_1 = obj.reindex(np.arange(6),method='ffill')#bfill 向后填充
print(obj_1)
obj_2 = obj.reindex(np.arange(6),method='bfill')#bfill 向后填充
print(obj_2)
out:
0 4.5
1 4.5
2 9.8
3 9.8
4 -1.2
5 -1.2
dtype: float64
0 4.5
1 9.8
2 9.8
3 -1.2
4 -1.2
5 NaN
dtype: float64
算数运算和数据对齐
pandas的一个重要功能,就是可以对不同索引的对象进行算数运算,再将对象相加时,如果不存在的索引对,则结果的索引就是该索引的并集
d1 = Series([1.2,5.6,9.8,-1.5],index=['a','b','c','d'])
d2 = Series([-3,-7,-6.9,-1.9,3.3],index=['a','b','c','d','e'])
d1+d2
#有相同索引的值直接进行相加,没有的并集后值为空值5
out:
a -1.8
b -1.4
c 2.9
d -3.4
e NaN
dtype: float64
#### 再看一个多维的并集
df1 = DataFrame(np.arange(9).reshape(3,3),columns=list('abc'),index = [1,2,3])
df1
out:
a b c
1 0 1 2
2 3 4 5
3 6 7 8
df2 = DataFrame(np.arange(12).reshape(4,3),columns=list('cde'),index = [1,2,3,4])
df2
out:
c d e
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
#可以看出,如果直接相加,则只是把相同列和行的加起来,行列时并集,但是值是求交集,其实不是很好
df1+df2
a b c d e
1 NaN NaN 2.0 NaN NaN
2 NaN NaN 8.0 NaN NaN
3 NaN NaN 14.0 NaN NaN
4 NaN NaN NaN NaN NaN
df1.add(df2,fill_value=11111)#用1111来填充不重叠的值,注意,如果本身位置为空,那么还是空的。好像只是针对列而言的
a b c d e
1 11111.0 11112.0 2.0 11112.0 11113.0
2 11114.0 11115.0 8.0 11115.0 11116.0
3 11117.0 11118.0 14.0 11118.0 11119.0
4 NaN NaN 11120.0 11121.0 11122.0
dataframe和series之间的运算
frame = DataFrame(np.arange(12).reshape(4,3),columns = list('bde'),index = [1,2,3,4])
print(frame)
series = frame.loc[1]
print(series)
out:
b d e
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
b 0
d 1
e 2
Name: 1, dtype: int32
#相减
frame-series #一直向下广播相减
out:
b d e
1 0 0 0
2 3 3 3
3 6 6 6
4 9 9 9
#相加
series = Series(range(3),index =list('def'))
frame+series #相加时,没有就合并
out:
b d e f
1 NaN 1.0 3.0 NaN
2 NaN 4.0 6.0 NaN
3 NaN 7.0 9.0 NaN
4 NaN 10.0 12.0 NaN
排序
根据条件对数据集进行排序
obj = Series(range(4),index =list('cdab'))
print(obj)
out:
c 0
d 1
a 2
b 3
dtype: int32
obj.sort_index() #根据index进行排序
out:
a 2
b 3
c 0
d 1
dtype: int32
obj.sort_values() #根据value值进行排序
out:
c 0
d 1
a 2
b 3
dtype: int32
针对dataframe,根据任意一个轴上的索引进行排序
frame = DataFrame(np.arange(8).reshape(2,4),index = ['two','one'],columns =list('fedc'))
print(frame)
out:
f e d c
two 0 1 2 3
one 4 5 6 7
frame.sort_index()#默认的是行排序,axis=0
out:
f e d c
one 4 5 6 7
two 0 1 2 3
#如果想要指定列,则需要指定axis的值为1
frame.sort_index(axis=1) #这就叫做指定轴来排序
out:
c d e f
two 3 2 1 0
one 7 6 5 4
frame = DataFrame({'a':[1,2,3,4],'b':[4,5,6,7]})
print(frame) #多维的字典组成key就是列属性,如果是一维的则是行属性
out:
a b
0 1 4
1 2 5
2 3 6
3 4 7
网友评论