Dataframe的相关知识点总结
dataframe相关知识点总结表.png(1)Dataframe的基本概念
data = {'name':['Jack','Tom','Mary'],
'age':[18,19,20],
'gender':['m','m','w']} # 字典dict
frame = pd.DataFrame(data)
print(frame,type(frame))
# 用pd.DataFrame()生成Dataframe
# Dataframe带有index(行标签)和columns(列标签)
print('--------')
print('查看frame的行标签和行标签类型:\n',frame.index,type(frame.index))
# .index查看行标签
print('查看frame的列标签和列标签类型:\n',frame.columns,type(frame.columns))
# .columns查看列标签
print('查看frame的值和值的数据类型:\n',frame.values,type(frame.values))
# .values查看值,数据类型为ndarra
输出结果:
age gender name
0 18 m Jack
1 19 m Tom
2 20 w Mary <class 'pandas.core.frame.DataFrame'>
--------
查看frame的行标签和行标签类型:
RangeIndex(start=0, stop=3, step=1) <class 'pandas.indexes.range.RangeIndex'>
查看frame的列标签和列标签类型:
Index(['age', 'gender', 'name'], dtype='object') <class 'pandas.indexes.base.Index'>
查看frame的值和值的数据类型:
[[18 'm' 'Jack']
[19 'm' 'Tom']
[20 'w' 'Mary']] <class 'numpy.ndarray'>
(2)Dataframe的创建方法
创建Dataframe用到 pd.DataFrame() 一共有5种创建Dataframe的方法。【创建Dataframe较多使用方法1,2,3】
--- >>> 方法1:由数组/list组成的字典创建Dataframe,columns为字典key,index为默认数字标签,字典的value的长度必须保持一致。
# 由数组/list组成的字典 创建Dataframe,columns为字典key,index为默认数字标签
# 字典的值的长度必须保持一致!
data1={'a':list(range(3)),
'b':list(range(4,7)),
'c':list(range(7,10))}
data2={'one':np.random.rand(3),
'two':np.random.rand(3)}
print(data1)
print(data2)
# data1是list组成的字典,data2是ndarray数组组成的字典
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print(df2)
print('----------------')
# 重新定义columns列标签
df1=pd.DataFrame(data1,columns=list('cdab'))
print(df1)
# 重新定义的columns比原数据多,现有数据中没有该列(比如'd'),则产生NaN值
df1=pd.DataFrame(data1,columns=list('ac'))
print(df1)
# 如果columns重新指定时候,列的数量可以少于原数据
print('----------------')
# 重新定义index行标签
df2=pd.DataFrame(data2,index=list('abe'))
print(df2)
# 重新定义的index行标签,必须与原数据一样多,不能多不能少,否则报错
输出结果:
{'a': [0, 1, 2], 'c': [7, 8, 9], 'b': [4, 5, 6]}
{'two': array([ 0.89216661, 0.31251123, 0.53794988]), 'one': array([ 0.75671784, 0.06891223, 0.60083476])}
a b c
0 0 4 7
1 1 5 8
2 2 6 9
one two
0 0.756718 0.892167
1 0.068912 0.312511
2 0.600835 0.537950
----------------
c d a b
0 7 NaN 0 4
1 8 NaN 1 5
2 9 NaN 2 6
a c
0 0 7
1 1 8
2 2 9
----------------
one two
a 0.756718 0.892167
b 0.068912 0.312511
e 0.600835 0.537950
--->>> 方法2:由Seris组成的字典创建Dataframe,columns为字典key,index为Series的标签;若Series没有重新制定标签,则是默认数字标签。
df1=pd.DataFrame({'one':pd.Series(np.random.rand(2)),
'two':pd.Series(np.random.rand(3))})
print(df1)
df2=pd.DataFrame({'one':pd.Series(np.random.rand(2),index=list('ab')),
'two':pd.Series(np.random.rand(3),index=list('abc'))})
print(df2)
# df1没有重新定义index标签,使用Series的默认数字标签
# df2重新定义了index标签
# 由Series组成的字典,Series可以长度不一样,生成的Dataframe会出现NaN值
输出结果:
one two
0 0.764246 0.276496
1 0.877065 0.385122
2 NaN 0.083968
one two
a 0.043106 0.609206
b 0.175361 0.457400
c NaN 0.082632
--->>> 方法3:通过二维数组直接创建Dataframe,得到一样形状的结果数据,如果不指定index和columns,两者均返回默认数字格式,并且index和colunms指定长度须与原数组保持一致。
# 创建二维数组ndarray
ar=np.reshape(np.random.rand(10),(2,5))
print(ar)
# 通过二维数组用pd.DataFrame()创建Dataframe
# 若没有重新制定index行标签和columns列标签,默认均使用数字标签
df1=pd.DataFrame(ar)
print(df1)
# 重新定义index和columns
# 重新定义的行标签和列标签必须与原数据长度一致,否则多了少了都报错
df2=pd.DataFrame(ar,index=list('ab'),columns=list('rkefi'))
print(df2)
输出结果:
[[ 0.31690795 0.4554707 0.46110939 0.49108045 0.93137807]
[ 0.08933855 0.13430606 0.45611142 0.40476408 0.72615298]]
0 1 2 3 4
0 0.316908 0.455471 0.461109 0.491080 0.931378
1 0.089339 0.134306 0.456111 0.404764 0.726153
r k e f i
a 0.316908 0.455471 0.461109 0.491080 0.931378
b 0.089339 0.134306 0.456111 0.404764 0.726153
--->>> 方法4:由字典组成的列表创建Dataframe,columns为字典的key,index不做指定则为默认数组标签
# 创建字典组成的列表
lst=[{'one':1,'two':2},
{'one':9,'two':12,'three':13}]
print(lst)
# 由字典组成的列表创建Dataframe
df1=pd.DataFrame(lst)
print(df1)
print('----------')
# 重新定义index,长度必须与原数据一致
df2=pd.DataFrame(lst,index=list('ab'))
print(df2)
# 重新定义columns,columns参数可以增加和减少现有列,如出现新的列,值为NaN
df3=pd.DataFrame(lst,columns=['one','two','four'])
print(df3)
输出结果:
[{'two': 2, 'one': 1}, {'two': 12, 'one': 9, 'three': 13}]
one three two
0 1 NaN 2
1 9 13.0 12
----------
one three two
a 1 NaN 2
b 9 13.0 12
one two four
0 1 2 NaN
1 9 12 NaN
--->>> 方法5:由字典组成的字典创建Dataframe,columns为字典的key
columns参数可以增加和减少现有列,如出现新的列,值为NaN;
index在这里和之前不同,并不能重新定义新的index,如果指向新的标签,值为NaN (非常重要!)
data = {'Jack':{'math':90,'english':89,'art':78},
'Marry':{'math':82,'english':95,'art':92},
'Tom':{'math':78,'english':67}}
df1 = pd.DataFrame(data)
print(df1)
print('----------')
df2 = pd.DataFrame(data, columns = ['Jack','Tom','Bob'])
df3 = pd.DataFrame(data, index = ['a','b','c'])
print(df2)
print(df3)
# df2 columns参数可以增加和减少现有列,如出现新的列,值为NaN
# df3 重新定义index了,因此返回的都是NaN
输出结果:
Jack Marry Tom
art 78 92 NaN
english 89 95 67.0
math 90 82 78.0
----------
Jack Tom Bob
art 78 NaN NaN
english 89 67.0 NaN
math 90 78.0 NaN
Jack Marry Tom
a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
(3)Dataframe的索引和切片
--->>> df[columns]默认选择列,[]中写列标签名
# 创建Dataframe
df=pd.DataFrame(np.random.rand(12).reshape(3,4)*100,
index=list('abc'),columns=['one','two','three','four'])
print(df)
# df[columns]默认选择列,[]中写列标签名
# 因而一般数据colunms都会单独制定,不会用默认数字列名,以免和index冲突
data1=df['one']
print(data1,type(data1))
print('---------')
# 选择单列时,单选列为Series,print结果为Series格式
data2=df[['two','four']]
print(data2,type(data2))
print('---------')
# 选择多列时,多选列为Dataframe,print结果为Dataframe格式
# 【注意!!!!】
# df[]中填数字时,则默认选择行,且只能进行切片的选择,不能单独选择(df[0])
data3=df[0:1]
print(data3,type(data3))
data4=df[:2]
print(data4)
# 输出结果即便只选择一行,类型为Dataframe
# df[]若用来选择行,则[]内指能填数字,若填写的是索引标签名来选择行(df['one']),则报错
# 核心重点:一般选择列用df[col],[]中写列名,选择行有其他方法。
输出结果:
one two three four
a 90.914477 19.150946 13.451741 7.575419
b 24.002117 84.665548 21.014130 59.794000
c 67.605976 23.585457 55.870082 37.634451
a 90.914477
b 24.002117
c 67.605976
Name: one, dtype: float64 <class 'pandas.core.series.Series'>
---------
two four
a 19.150946 7.575419
b 84.665548 59.794000
c 23.585457 37.634451 <class 'pandas.core.frame.DataFrame'>
---------
one two three four
a 90.914477 19.150946 13.451741 7.575419 <class 'pandas.core.frame.DataFrame'>
one two three four
a 90.914477 19.150946 13.451741 7.575419
b 24.002117 84.665548 21.014130 59.794000
--->>> df.loc[] - 按index选择行
# df.loc[] - 按index选择行
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df1)
print(df2)
print('-----')
data1=df1.loc['one']
data2=df2.loc[2]
print(data1,type(data1))
print(data2,type(data2))
# 选择单行时,返回Series,且原数据columns变为Series的index
print('-----')
data3=df1.loc[['one','three']]
data4=df2.loc[[3,0,4]]
print(data3,type(data3))
print(data4,type(data4))
# 选择多行时,返回Dataframe,若标签不存在,则返回NaN
# 顺序可变
print('-----')
# 做切片索引
data5=df1.loc['one':'three']
print(data5)
# df.loc[index]做切片,末端包含
data6=df2.loc[0:2]
print(data6)
#核心重点:df.loc[label]主要针对index选择行,同时支持指定index,及默认数字index
输出结果:
a b c d
one 25.422364 20.795395 11.336709 90.864404
two 86.156487 11.198670 54.739204 45.817514
three 83.311279 37.328900 84.881397 26.402103
four 8.671409 61.604665 71.907952 66.940001
a b c d
0 48.645209 99.572800 27.856945 92.607217
1 79.674493 73.053547 18.308831 88.747135
2 25.372909 62.037611 85.714760 49.503700
3 71.654423 24.761800 11.173785 46.230001
-----
a 25.422364
b 20.795395
c 11.336709
d 90.864404
Name: one, dtype: float64 <class 'pandas.core.series.Series'>
a 25.372909
b 62.037611
c 85.714760
d 49.503700
Name: 2, dtype: float64 <class 'pandas.core.series.Series'>
-----
a b c d
one 25.422364 20.795395 11.336709 90.864404
three 83.311279 37.328900 84.881397 26.402103 <class 'pandas.core.frame.DataFrame'>
a b c d
3 71.654423 24.7618 11.173785 46.230001
0 48.645209 99.5728 27.856945 92.607217
4 NaN NaN NaN NaN <class 'pandas.core.frame.DataFrame'>
-----
a b c d
one 25.422364 20.795395 11.336709 90.864404
two 86.156487 11.198670 54.739204 45.817514
three 83.311279 37.328900 84.881397 26.402103
a b c d
0 48.645209 99.572800 27.856945 92.607217
1 79.674493 73.053547 18.308831 88.747135
2 25.372909 62.037611 85.714760 49.503700
--->>> df.iloc[] - 按照整数位置(从轴的0到length-1)选择行,类似list的索引,其顺序就是dataframe的整数位置,从0开始计
# df.iloc[] - 按照整数位置(从轴的0到length-1)选择行
# 类似list的索引,其顺序就是dataframe的整数位置,从0开始计
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
print(df)
print('------')
data1=df.iloc[0]
print(data1,type(data1))
data2=df.iloc[-1]
print(data2,type(data2))
print('------')
# 单位置索引,与list的索引类似
data3=df.iloc[[0,2]]
print(data3,type(data3))
#data4=df.iloc[[2,1,4]]
#print(data4)输出会报错,因为索引为4的行不存在
# 和loc索引不同,不能索引超出数据行数的整数位置
# 多位置索引,顺序可变
# df.iloc[索引]做切片,末端不包含
data4=df.iloc[:2]
print(data4)
# data4定位索引为0-1的行
输出结果:
a b c d
one 74.980847 68.412788 55.553170 70.115846
two 33.516484 79.838096 54.487476 66.310558
three 86.773031 73.986408 23.077972 56.278295
four 79.205418 19.199874 10.778306 74.669667
------
a 74.980847
b 68.412788
c 55.553170
d 70.115846
Name: one, dtype: float64 <class 'pandas.core.series.Series'>
a 79.205418
b 19.199874
c 10.778306
d 74.669667
Name: four, dtype: float64 <class 'pandas.core.series.Series'>
------
a b c d
one 74.980847 68.412788 55.553170 70.115846
three 86.773031 73.986408 23.077972 56.278295 <class 'pandas.core.frame.DataFrame'>
a b c d
one 74.980847 68.412788 55.553170 70.115846
two 33.516484 79.838096 54.487476 66.310558
--->>> 多重索引,同时索引行和列。应先选择列再选择行 —— 相当于对于一个数据,先筛选字段,再选择数据量。
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
print(df)
print('------')
print(df['a'].loc['two'])
print('------')
# 从Dataframe中索引选中指定的一个数据
# 选择a列,two行
print(df[['b','d']].loc[['one','three']])
print('------')
# 从Dataframe中索引选中指定的不连续的行和列
# 选择b、d列,one、three行
print(df[['a','d']].loc['two':'four'])
print('------')
# df[]选择多列是用逗号分隔
# df.loc[]选择多行,用切片索引,用冒号分隔
print(df[df['a'] < 50].iloc[:2]) # 选择满足判断索引的前两行数据
输出结果:
a b c d
one 23.625057 14.995706 63.663427 67.586236
two 6.870833 59.007275 66.547176 50.959152
three 48.086678 50.274425 59.094988 42.759351
four 58.284649 82.886278 16.476423 27.154450
------
6.87083343585
------
b d
one 14.995706 67.586236
three 50.274425 42.759351
------
a d
two 6.870833 50.959152
three 48.086678 42.759351
four 58.284649 27.154450
------
a b c d
one 23.625057 14.995706 63.663427 67.586236
two 6.870833 59.007275 66.547176 50.959152
--->>> 布尔型索引和Series原理相同。
# 布尔型索引
# 和Series原理相同
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
print(df)
print('------')
b1 = df < 20
print(b1,type(b1))
print(df[b1]) # 也可以书写为 df[df < 20]
print('------')
# 不做索引则会对数据每个值进行判断
# 索引结果保留 所有数据:True返回原数据,False返回值为NaN
b2 = df['a'] > 50
print(b2,type(b2))
print(df[b2]) # 也可以书写为 df[df['a'] > 50]
print('------')
# 单列做判断
# 索引结果保留 单列判断为True的行数据,包括其他列
b3 = df[['a','b']] > 50
print(b3,type(b3))
print(df[b3]) # 也可以书写为 df[df[['a','b']] > 50]
print('------')
# 多列做判断
# 索引结果保留 所有数据:True返回原数据,False返回值为NaN
b4 = df.loc[['one','three']] < 50
print(b4,type(b4))
print(df[b4]) # 也可以书写为 df[df.loc[['one','three']] < 50]
print('------')
# 多行做判断
# 索引结果保留 所有数据:True返回原数据,False返回值为NaN
输出结果:
a b c d
one 19.185849 20.303217 21.800384 45.189534
two 50.105112 28.478878 93.669529 90.029489
three 35.496053 19.248457 74.811841 20.711431
four 24.604478 57.731456 49.682717 82.132866
------
a b c d
one True False False False
two False False False False
three False True False False
four False False False False <class 'pandas.core.frame.DataFrame'>
a b c d
one 19.185849 NaN NaN NaN
two NaN NaN NaN NaN
three NaN 19.248457 NaN NaN
four NaN NaN NaN NaN
------
one False
two True
three False
four False
Name: a, dtype: bool <class 'pandas.core.series.Series'>
a b c d
two 50.105112 28.478878 93.669529 90.029489
------
a b
one False False
two True False
three False False
four False True <class 'pandas.core.frame.DataFrame'>
a b c d
one NaN NaN NaN NaN
two 50.105112 NaN NaN NaN
three NaN NaN NaN NaN
four NaN 57.731456 NaN NaN
------
a b c d
one True True True True
three True True False True <class 'pandas.core.frame.DataFrame'>
a b c d
one 19.185849 20.303217 21.800384 45.189534
two NaN NaN NaN NaN
three 35.496053 19.248457 NaN 20.711431
four NaN NaN NaN NaN
------
(4)Dataframe的基本技巧
--->>> .head()和.tail()分别查看头部数据与尾部数据;.T转置数据
# 数据查看、转置
df = pd.DataFrame(np.random.rand(16).reshape(8,2)*100,
columns = ['a','b'])
print(df.head(2))
print(df.tail())
# .head()查看头部数据
# .tail()查看尾部数据
# 默认查看5条
print(df.T)
# .T 转置 原形状是(3,4)置换后会变为(4,3),且dataframe的index与columns互换
输出结果:
a b
0 7.949040 32.642699
1 92.208337 10.828741
a b
3 69.807085 57.420046
4 79.708661 0.590644
5 65.844373 47.775450
6 4.988854 64.613866
7 8.913846 87.750012
0 1 2 3 4 5 \
a 7.949040 92.208337 82.494079 69.807085 79.708661 65.844373
b 32.642699 10.828741 90.075549 57.420046 0.590644 47.775450
6 7
a 4.988854 8.913846
b 64.613866 87.750012
--->>> 给Dataframe添加数据和修改数据
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df)
# 【添加数据】
df['g']=100
df.loc[5]=200
print(df)
print('----------')
# df['g']=100,新增列g,值均为100
# df.loc[5]=200,新增行index为5的行,值均为200
# 【修改数据】
df['c']=120
df.loc[0:1]=250
print(df)
# df['c']=120,将c列的值均修改为120
# df.loc[0:1]=250,将index为0-1的行的值均修改为250
输出结果:
a b c d
0 81.933629 58.281026 97.975374 92.070574
1 61.370852 56.591523 40.667676 32.055902
2 60.103233 5.406372 66.602369 84.946211
3 84.100587 13.442995 16.667553 13.657240
a b c d g
0 81.933629 58.281026 97.975374 92.070574 100
1 61.370852 56.591523 40.667676 32.055902 100
2 60.103233 5.406372 66.602369 84.946211 100
3 84.100587 13.442995 16.667553 13.657240 100
5 200.000000 200.000000 200.000000 200.000000 200
----------
a b c d g
0 250.000000 250.000000 250 250.000000 250
1 250.000000 250.000000 250 250.000000 250
2 60.103233 5.406372 120 84.946211 100
3 84.100587 13.442995 120 13.657240 100
5 200.000000 200.000000 120 200.000000 200
--->>> 给Dataframe删除数据,del语句删除列,drop()删除行、列
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'],index=list('njkr'))
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'],index=[1,2,3,4])
print(df1)
print(df2)
print('-----')
del df1['a']
print(df1)
print('-----')
# del 语句删除列
print(df1.drop(['n']))
print(df2.drop(1))
print(df1)
print(df2)
print('-----')
# drop(labe),括号内填写行index标签,删除行
# drop()删除行,默认inplace=False → 删除后生成新的数据,不改变原数据
# drop()默认删除行,
# drop()若要删除列,需要加上axis = 1
print(df1.drop(['d'],axis=1))
print(df1)
输出结果:
a b c d
n 99.781228 25.472789 77.134248 88.759165
j 30.835066 97.793713 5.044348 63.591559
k 91.380903 54.532245 64.167292 15.215125
r 11.363368 22.048608 11.202747 94.795838
a b c d
1 81.331983 72.663489 87.489846 39.120100
2 17.028264 47.835036 95.802606 0.556850
3 97.790851 68.518948 42.980309 46.952173
4 40.470713 2.216181 15.092647 95.034669
-----
b c d
n 25.472789 77.134248 88.759165
j 97.793713 5.044348 63.591559
k 54.532245 64.167292 15.215125
r 22.048608 11.202747 94.795838
-----
b c d
j 97.793713 5.044348 63.591559
k 54.532245 64.167292 15.215125
r 22.048608 11.202747 94.795838
a b c d
2 17.028264 47.835036 95.802606 0.556850
3 97.790851 68.518948 42.980309 46.952173
4 40.470713 2.216181 15.092647 95.034669
b c d
n 25.472789 77.134248 88.759165
j 97.793713 5.044348 63.591559
k 54.532245 64.167292 15.215125
r 22.048608 11.202747 94.795838
a b c d
1 81.331983 72.663489 87.489846 39.120100
2 17.028264 47.835036 95.802606 0.556850
3 97.790851 68.518948 42.980309 46.952173
4 40.470713 2.216181 15.092647 95.034669
-----
b c
n 25.472789 77.134248
j 97.793713 5.044348
k 54.532245 64.167292
r 22.048608 11.202747
b c d
n 25.472789 77.134248 88.759165
j 97.793713 5.044348 63.591559
k 54.532245 64.167292 15.215125
r 22.048608 11.202747 94.795838
--->>> Dataframe数据的对齐计算,
df1 = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
print(df1 + df2)
# DataFrame对象之间的数据自动按照列和索引(行标签)对齐
输出结果:
A B C D
0 0.375689 0.154600 0.588799 NaN
1 0.314006 0.322111 1.521050 NaN
2 -0.617338 1.621051 1.019352 NaN
3 0.079740 1.035736 -0.069641 NaN
4 0.362620 1.040316 -3.178383 NaN
5 1.162456 1.628615 1.699686 NaN
6 -1.306092 -0.410553 -3.453950 NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
--->>> Dataframe的数据排序:
【排序1 - 按值排序,.sort_values()】
# 排序1 - 按值排序 .sort_values
# 同样适用于Series
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df1)
print(df1.sort_values(['a'], ascending = True)) # 升序
print(df1.sort_values(['a'], ascending = False)) # 降序
print('------')
# ascending参数:设置升序降序,默认升序
# 单列排序
df2 = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':list(range(8)),
'c':list(range(8,0,-1))})
print(df2)
print(df2.sort_values(['a','c']))
# 多列排序,按列顺序排序
# 前提是Dataframe的数据本身是可以排序的
# df2的多列排序,先排序a升序,1,2升序,然后排序c从1开始排序5,6,7,8升序
输出结果:
a b c d
0 21.476998 94.124307 74.651624 86.206047
1 64.418365 51.001043 67.371010 57.975594
2 98.097496 42.493171 18.868006 96.989554
3 50.945730 5.336877 45.237293 48.395052
a b c d
0 21.476998 94.124307 74.651624 86.206047
3 50.945730 5.336877 45.237293 48.395052
1 64.418365 51.001043 67.371010 57.975594
2 98.097496 42.493171 18.868006 96.989554
a b c d
2 98.097496 42.493171 18.868006 96.989554
1 64.418365 51.001043 67.371010 57.975594
3 50.945730 5.336877 45.237293 48.395052
0 21.476998 94.124307 74.651624 86.206047
------
a b c
0 1 0 8
1 1 1 7
2 1 2 6
3 1 3 5
4 2 4 4
5 2 5 3
6 2 6 2
7 2 7 1
a b c
3 1 3 5
2 1 2 6
1 1 1 7
0 1 0 8
7 2 7 1
6 2 6 2
5 2 5 3
4 2 4 4
【排序2 - 索引排序 .sort_index()】
# 排序2 - 索引排序 .sort_index
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = [5,4,3,2],
columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['h','s','x','g'],
columns = ['a','b','c','d'])
print(df1)
data1=df1.sort_index()
print(data1)
print(df1)
print('--------')
print(df2)
data2=df2.sort_index()
print(data2)
print(df2)
# 按照index排序
# 默认 ascending=True升序, inplace=False生成新数据排序
输出结果:
a b c d
5 46.148715 3.841598 28.772054 30.769325
4 62.050299 72.191961 96.291796 86.469963
3 34.731134 22.328996 96.434859 96.952085
2 14.084726 64.354711 45.780757 82.532313
a b c d
2 14.084726 64.354711 45.780757 82.532313
3 34.731134 22.328996 96.434859 96.952085
4 62.050299 72.191961 96.291796 86.469963
5 46.148715 3.841598 28.772054 30.769325
a b c d
5 46.148715 3.841598 28.772054 30.769325
4 62.050299 72.191961 96.291796 86.469963
3 34.731134 22.328996 96.434859 96.952085
2 14.084726 64.354711 45.780757 82.532313
--------
a b c d
h 93.225186 74.185120 83.354408 98.871708
s 69.450625 14.009624 98.526514 32.786719
x 95.268889 97.052805 70.099924 22.170984
g 96.089474 79.845508 66.001147 46.291457
a b c d
g 96.089474 79.845508 66.001147 46.291457
h 93.225186 74.185120 83.354408 98.871708
s 69.450625 14.009624 98.526514 32.786719
x 95.268889 97.052805 70.099924 22.170984
a b c d
h 93.225186 74.185120 83.354408 98.871708
s 69.450625 14.009624 98.526514 32.786719
x 95.268889 97.052805 70.099924 22.170984
g 96.089474 79.845508 66.001147 46.291457
网友评论