美文网首页
Pandas基本属性

Pandas基本属性

作者: 李小夭 | 来源:发表于2019-08-06 09:47 被阅读0次

Numpy是列表的话,Pandas更类似于字典,可以重命名行名和列名。

创建pandas序列 会自动加上序号和dtype

import pandas as pd
import numpy as np
s = pd.Series([1,3,6,np.nan,44,1])
s

0     1.0
1     3.0
2     6.0
3     NaN
4    44.0
5     1.0
dtype: float64

创建DataFrame

  1. 生成默认行号和列号
df1 = pd.DataFrame(np.arange(12).reshape((3,4)))
df1

    0   1   2   3
0   0   1   2   3
1   4   5   6   7
2   8   9   10  11
  1. 新增日期索引
dates = pd.date_range('20160101',periods = 6)
dates

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', freq='D')

df = pd.DataFrame(np.random.randn(6,4),index = dates,columns = ['a','b','c','d'])
df

            a           b           c           d
2016-01-01  -1.281511   1.713843    -0.606131   -0.699298
2016-01-02  -0.690049   -0.624657   1.521370    -0.226207
2016-01-03  1.280099    0.188350    -0.481156   0.131706
2016-01-04  -0.026690   0.899729    -0.678333   -1.096834
2016-01-05  0.517648    0.291178    -0.879998   -0.823239
2016-01-06  -1.936642   -0.286916   0.362583    0.444345
  1. 字典形式定义每一列
df2 = pd.DataFrame({'A':1.,
                    'B':pd.Timestamp('20130102'),
                    'C':pd.Series(1,index=list(range(4)), dtype= 'float32' ),
                    'D':np.array([3]*4, dtype = 'int32'),
                    'E':pd.Categorical(["test","train","test","train"]),
                    'F':'foo'})
df2

    A   B           C   D   E       F
0   1.0 2013-01-02  1.0 3   test    foo
1   1.0 2013-01-02  1.0 3   train   foo
2   1.0 2013-01-02  1.0 3   test    foo
3   1.0 2013-01-02  1.0 3   train   foo

DataFrame的基本属性

  1. 打印每一列的数据形式
df2.dtypes 

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object
  1. 打印行名、列名和值
df2.index # 打印行名
Int64Index([0, 1, 2, 3], dtype='int64')

df2.columns # 打印列名
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

df2.values # 打印值
array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)
  1. describe 计数、均值、标准差、分位数(只运算数值型的列)
df2.describe() 

        A   C   D
count   4.0 4.0 4.0
mean    1.0 1.0 3.0
std     0.0 0.0 0.0
min     1.0 1.0 3.0
25%     1.0 1.0 3.0
50%     1.0 1.0 3.0
75%     1.0 1.0 3.0
max     1.0 1.0 3.0
  1. 行列转置
df2.T

    0   1   2   3
A   1   1   1   1
B   2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00
C   1   1   1   1
D   3   3   3   3
E   test    train   test    train
F   foo foo foo foo
  1. 排序
df2.sort_index(axis=1,ascending=False) # 按列倒序

    F   E   D   C   B   A
0   foo test    3   1.0 2013-01-02  1.0
1   foo train   3   1.0 2013-01-02  1.0
2   foo test    3   1.0 2013-01-02  1.0
3   foo train   3   1.0 2013-01-02  1.0


df2.sort_index(axis=0,ascending=False) # 按行倒序

    A   B   C   D   E   F
3   1.0 2013-01-02  1.0 3   train   foo
2   1.0 2013-01-02  1.0 3   test    foo
1   1.0 2013-01-02  1.0 3   train   foo
0   1.0 2013-01-02  1.0 3   test    foo

df2.sort_values(by='E') # 按值排序

    A   B   C   D   E   F
0   1.0 2013-01-02  1.0 3   test    foo
2   1.0 2013-01-02  1.0 3   test    foo
1   1.0 2013-01-02  1.0 3   train   foo
3   1.0 2013-01-02  1.0 3   train   foo

Pandas学习教程来源请戳这里

相关文章

网友评论

      本文标题:Pandas基本属性

      本文链接:https://www.haomeiwen.com/subject/bbsvdctx.html