pandas基本操作手册

作者: 张小张x86 | 来源:发表于2019-04-22 12:14 被阅读0次

pandas基本操作手册
《莫烦Python》笔记 -- pandas部分
pandas操作手册
Pandas: Index与Selection方式 (Part
Pandas练习笔记(一)----series
python数据分析——pandas1
pandas使用方法及相关函数记录1
Pandas
利用Python进行数据分析（七）
pandas教程：pandas主要功能详解

Series

series是一种类似一维数组的对象，它由一组数据以及一组与之相关的标签组成，通过pandas的Series函数实例化一个series

创建series

import pandas as pd
s = pd.Series([5,2,3,4,1])
>>>
0    2
1    3
2    4
3    1
4    5
dtype: int64

s.values
>>>array([2, 3, 4, 1, 5])
s.index
>>>RangeIndex(start=0, stop=5, step=1)

s2 = pd.Series([3,2,4,1,5],index = ['a','b','c','d','e'])
print(s2)
>>>
a    3
b    2
c    4
d    1
e    5
dtype: int64

#根据字典创建series
dict = {'name':'joha','sex':'male','age':'18'}
s3 = pd.Series(dict)
print(s3)
>>>
name    joha
sex     male
age       18
dtype: object

根据索引选取Series的一个值或多个值

s2 = pd.Series([3,2,4,1,5],index = ['a','b','c','d','e'])
#批量单个值
s2['a']
>>>3

#批量选取多个值
s2[['a','c','e']]
>>>
a    3
c    4
e    5
dtype: int64

s2[s2>3]
>>>
c    4
e    5
dtype: int64

s2*3
>>>
a     9
b     6
c    12
d     3
e    15
dtype: int64

'c' in s2
>>>True
'f' in s2
>>>False

series在算数运算中自动对齐不同索引的数据

s1 = pd.Series([3,2,4,1,5],index = ['a','b','c','d','e'])
s2 = pd.Series([3,-5,1],index = ['a','c','e'])
print(s1+s2)
>>>
a    6.0
b    NaN
c   -1.0
d    NaN
e    6.0
dtype: float64

series中的index可以通过赋值的方式进行修改

s2 = pd.Series([3,-5,1],index = ['a','c','e'])
s2.index = [1,2,3]
print(s2)
>>>
1    3
2   -5
3    1
dtype: int64

DataFrame

创建dataFrame

test_dict = {'id':[1,2,3,4,5,6],
             'name':['Alice','Bob','Cindy','Eric','Helen','Grace '],
             'math':[90,89,99,78,97,93],
             'english':[89,94,80,94,94,90]}
#[1].直接写入参数test_dict
test_dict_df = pd.DataFrame(test_dict)
print(test_dict_df)
>>>
   id    name  math  english
0   1   Alice    90       89
1   2     Bob    89       94
2   3   Cindy    99       80
3   4    Eric    78       94
4   5   Helen    97       94
5   6  Grace     93       90
#[2].字典型赋值
test_dict_df = pd.DataFrame(data=test_dict)
>>>
   id    name  math  english
0   1   Alice    90       89
1   2     Bob    89       94
2   3   Cindy    99       80
3   4    Eric    78       94
4   5   Helen    97       94
5   6  Grace     93       90

test_dict_df = pd.DataFrame(test_dict,columns=['name','math','english','id'])
print(test_dict_df)
>>>
     name  math  english  id
0   Alice    90       89   1
1     Bob    89       94   2
2   Cindy    99       80   3
3    Eric    78       94   4
4   Helen    97       94   5
5  Grace     93       90   6

DataFrame取值

test_dict_df['name']
>>>
0     Alice
1       Bob
2     Cindy
3      Eric
4     Helen
5    Grace 
Name: name, dtype: object

test_dict_df.name
>>>
0     Alice
1       Bob
2     Cindy
3      Eric
4     Helen
5    Grace 
Name: name, dtype: object

对某一列赋值

test_dict_df['id'] = pd.Series(['11','22','33','44','55'])
print(test_dict_df)
>>>
     name  math  english   id
0   Alice    90       89   11
1     Bob    89       94   22
2   Cindy    99       80   33
3    Eric    78       94   44
4   Helen    97       94   55
5  Grace     93       90  NaN

删除某一列

test_dict_df.drop(['id'],axis=1)
>>>
     name  math  english
0   Alice    90       89
1     Bob    89       94
2   Cindy    99       80
3    Eric    78       94
4   Helen    97       94
5  Grace     93       90

多维数组构建DataFrame

test_dict = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]],columns = ['a','b','c'],index=[1,2,3])
print(test_dict)
>>>
   a  b  c
1  1  2  3
2  4  5  6
3  7  8  9

print(test_dict.values)
>>>
[[1 2 3]
 [4 5 6]
 [7 8 9]]

构建series和dataFrame时，可以传入数组当作index

test_dict = pd.Series([1,2,3],index = ['a','b','c'])
print(test_dict)
>>>
a    1
b    2
c    3
dtype: int64

test_dict.index = ['c','d','e']
print(test_dict)
>>>
c    1
d    2
e    3
dtype: int64

pandas重新索引 reindex

test_dict = pd.Series([1,2,3],index = ['a','b','c'])
test_dict1 = test_dict.reindex(['a','b','c','d','e'])
print(test_dict1)
>>>
a    1.0
b    2.0
c    3.0
d    NaN
e    NaN
dtype: float64

#填充
test_dict = pd.Series([1,2,3],index = ['a','b','c'])
test_dict1 = test_dict.reindex(['a','b','c','d','e'],fill_value = 0)
print(test_dict1)
>>>
a    1
b    2
c    3
d    0
e    0
dtype: int64

obj = pd.Series(['Jim','Mike','Jhon'],index = [0,3,6])
obj1 = obj.reindex(range(8),method = 'ffill')
print(obj1)
>>>
0     Jim
1     Jim
2     Jim
3    Mike
4    Mike
5    Mike
6    Jhon
7    Jhon
dtype: object

reindex作用于列

df = pd.DataFrame(np.arange(1,10).reshape((3,3)),index = ['d','a','c'],columns = ['Jim','Mike','Jhon'])
df1 = df.reindex(['a','b','c','d'],['Jhon','Mike','Jim'])
print(df1)
>>>
   Jhon  Mike  Jim
a   6.0   5.0  4.0
b   NaN   NaN  NaN
c   9.0   8.0  7.0
d   3.0   2.0  1.0

丢弃指定轴上的项 DataFrame.drop

test_dict = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print('output:{}'.format(test_dict))
a = test_dict.drop(['a','c'])
print('>>>{}'.format(a))
>>>
output:
a    1
b    2
c    3
d    4
e    5
dtype: int64
>>>
b    2
d    4
e    5
dtype: int64

df = pd.DataFrame(np.arange(25).reshape((5,5)),index = list('12345'),columns = list('abcde'))
print(df)
a = df.drop(['2','4'])
print(a)
>>>
    a   b   c   d   e
1   0   1   2   3   4
2   5   6   7   8   9
3  10  11  12  13  14
4  15  16  17  18  19
5  20  21  22  23  24
>>>
    a   b   c   d   e
1   0   1   2   3   4
3  10  11  12  13  14
5  20  21  22  23  24

df = pd.DataFrame(np.arange(25).reshape((5,5)),index = list('12345'),columns = list('abcde'))
print(df)
a = df.drop(['a','c'],axis = 1)
print(a)
>>>
    a   b   c   d   e
1   0   1   2   3   4
2   5   6   7   8   9
3  10  11  12  13  14
4  15  16  17  18  19
5  20  21  22  23  24
>>>
    b   d   e
1   1   3   4
2   6   8   9
3  11  13  14
4  16  18  19
5  21  23  24

索引，选取和过滤

object = pd.Series([3,2,4,1,5],index = ['a','b','c','d','e'])
print(object[1:3])
>>>
b    2
c    4
dtype: int64

object[['a','b','c']]
>>>
a    3
b    2
c    4
dtype: int64

条件过滤

object[object<4]
>>>
a    3
b    2
d    1
dtype: int64

DataFrame的索引可以按行也可以按列

object = pd.DataFrame(np.arange(25).reshape((5,5)),index = list('12345'),columns = list('abcde'))
print(object['b'])
>>>
1     1
2     6
3    11
4    16
5    21
Name: b, dtype: int64

print(object[['a','c']])
>>>
    a   c
1   0   2
2   5   7
3  10  12
4  15  17
5  20  22

按行索引

print(object[1:4])
>>>
    a   b   c   d   e
2   5   6   7   8   9
3  10  11  12  13  14
4  15  16  17  18  19

条件索引

object[object['b']>10]
>>>
    a   b   c   d   e
3  10  11  12  13  14
4  15  16  17  18  19
5  20  21  22  23  24

对不同索引的对象进行计算

df1 = pd.DataFrame(np.arange(9).reshape(3,3),columns = list('abc'),index = [1,2,3])
print(df1)
>>>
   a  b  c
1  0  1  2
2  3  4  5
3  6  7  8

df2 = pd.DataFrame(np.arange(16).reshape(4,4),columns = list('bcde'),index = [2,3,4,5])
print(df2)
>>>
    b   c   d   e
2   0   1   2   3
3   4   5   6   7
4   8   9  10  11
5  12  13  14  15

print(df1+df2)
>>>
    a     b     c   d   e
1 NaN   NaN   NaN NaN NaN
2 NaN   4.0   6.0 NaN NaN
3 NaN  11.0  13.0 NaN NaN
4 NaN   NaN   NaN NaN NaN
5 NaN   NaN   NaN NaN NaN

df1.add(df2,fill_value = 0)
#df1与df2两两都没有的值，依然是NaN
>>>
     a     b     c     d     e
1  0.0   1.0   2.0   NaN   NaN
2  3.0   4.0   6.0   2.0   3.0
3  6.0  11.0  13.0   6.0   7.0
4  NaN   8.0   9.0  10.0  11.0
5  NaN  12.0  13.0  14.0  15.0

DataFrame与Series之间的计算
DataFrame与Series计算时会引入广播操作

df1 = pd.DataFrame(np.arange(12).reshape((4,3)),columns = list('abc'))
print(df1)
>>>
   a   b   c
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11

series1 = pd.Series([3,4,5],index = ['a','b','c'])
print(series1)
>>>
a    3
b    4
c    5

#逐行相减
print(df1 - series1)
>>>
a  b  c
0 -3 -3 -3
1  0  0  0
2  3  3  3
3  6  6  6

series 取自dataFrame

series2 = df1['b']
series2
>>>
0     1
1     4
2     7
3    10
Name: b, dtype: int64

df1.sub(series2,axis = 0)
>>>
   a  b  c
0 -1  0  1
1 -1  0  1
2 -1  0  1
3 -1  0  1

函数应用和映射
numpy的元素级函数可以直接作用到DataFrame上

print(np.square(df1))
>>>
    a    b    c
0   0    1    4
1   9   16   25
2  36   49   64
3  81  100  121

DataFrame将一个函数直接应用到其本身或者各行各列，形成一个新的数据或者行或列

def fun(x):
    return x.max() - x.min()
df1.apply(fun,axis = 1)
>>>
0    2
1    2
2    2
3    2
dtype: int64

排序

df1 = pd.DataFrame(np.random.randn(4,4),columns=list('bcad'),index=[2,4,3,1])
print(df1)
>>>
          b         c         a         d
2  0.706356 -0.896474 -1.879608  0.322054
4  0.666188 -0.450170  0.914737  0.691662
3 -1.676381 -0.499211 -0.136020 -1.734251
1 -2.111717 -0.226238  1.656514  0.146311

print(df1.sort_index())
>>>
          b         c         a         d
1 -2.111717 -0.226238  1.656514  0.146311
2  0.706356 -0.896474 -1.879608  0.322054
3 -1.676381 -0.499211 -0.136020 -1.734251
4  0.666188 -0.450170  0.914737  0.691662

print(df1.sort_values(by=['b','a']))
>>>
          b         c         a         d
1 -2.111717 -0.226238  1.656514  0.146311
3 -1.676381 -0.499211 -0.136020 -1.734251
4  0.666188 -0.450170  0.914737  0.691662
2  0.706356 -0.896474 -1.879608  0.322054

统计相关计算

求和 sum
最大 max
最小 min
方差 var
求平均 mean
所有信息 describe

print(df1.describe())
>>>
                        b               c               a               d
count  4.000000  4.000000  4.000000  4.000000
mean  -0.603889 -0.518024  0.138906 -0.143556
std    1.500402  0.278880  1.533517  1.084545
min   -2.111717 -0.896474 -1.879608 -1.734251
25%   -1.785215 -0.598527 -0.571917 -0.323829
50%   -0.505096 -0.474691  0.389358  0.234183
75%    0.676230 -0.394187  1.100181  0.414456
max    0.706356 -0.226238  1.656514  0.691662

处理数据缺失

dropna 去除nan数据
fillna 使用默认填入
isnull 返回一个含有布尔值的对象，标注nan的位置
-notnull isnull否定式

pandas基本操作手册
Series series是一种类似一维数组的对象，它由一组数据以及一组与之相关的标签组成，通过pandas的Se...
《莫烦Python》笔记 -- pandas部分
3.1 pandas基本介绍 3.2 pandas选择数据 3.3 pandas设置值 3.4 pandas处理缺...
pandas操作手册
1、删除一列： Del df[column] -> df的列直接被删除； Df.drop(col,ax...
Pandas: Index与Selection方式 (Part
本文的内容是关于: Pandas 三种基本的Index方式: .loc, .iloc, []. Pandas 基本...
Pandas练习笔记(一)----series
pandas series基本知识 1.Pandas series 关键字：一维对象可以看出 Pandas Se...
python数据分析——pandas1
1.1 Pandas基本介绍 Python Data Analysis Library或 Pandas是基于Num...
pandas使用方法及相关函数记录1
pandas使用方法记录总结基本操作记录查看pandas版本查看pandas及相关库版本创建DataFra...
Pandas
Panda 安装 MacOS： Ubuntu： Pandas的基本介绍 Pandas与Numpy的不同与联系如果...
利用Python进行数据分析（七）
pandas 前面我们学习了pandas两种基本的数据结构Series和DataFrame以及基本功能，这节我们学...
pandas教程：pandas主要功能详解
pandas基本功能将文件数据导入Pandas 通过pandas提供的read_xxx相关的函数可以读取文件中的...