用python进行数据分析 pandas！

作者: 14e61d025165 | 来源:发表于2019-06-06 15:35 被阅读0次

2019-10-12
药品是真的贵！利用Python对药品销售进行数据分析！
Python数据分析包的学习
pandas索引取数
数据分析学习计划
pandas应用实战
pandas简介
Python数据分析之pandas学习
第5章 Pandas入门(1)
数据分析必备,《利用Python进行数据分析》推荐

Series的创建

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#字典创建
a={'a':1 ,'b':2 , 'c':3}
S=pd.Series(a)

数组创建

S1=pd.Series(np.random.randn(4))

用标量创建

S2=pd.Series(10,index=range(4))

标量的个数由idnex的个数决定

复制代码
</pre>

Series的索引

** Python学习交流群：1004391443，这里有资源共享，技术解答，还有小编从最基础的Python资料到项目实战的学习资料都有整理，希望能帮助你更了解python，学习python**

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s = pd.Series(np.random.rand(5)*100, index = ['a','b','c','d','e'])
print(s[['a','c','e']])#选取自己想要的值
print(s[1:3])#左闭右开
print(s['a':'c'])#都是闭区间
print(s[-1])#倒过来
print(s[::2])#步长为2

布尔索引

bs1=s>50
bs2=s.isnull()
bs3=s.notnull()
S2=s[s.notnull()]#输出结果是输出S2不是null的值

布尔索引的作用可以用于筛选，返回的是布尔类型

复制代码
</pre>

Series的常用函数

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s = pd.Series(np.random.rand(3), index = ['a','b','c'])
s.head()#查看前五条数据
s.tail()#查看后五条
s1=s.reindex(['b','a','c','d'],fill_value=0)#reindex新加的索引行为空,fill_value参数是把空值填充为0
复制代码
</pre>

Series的自动对齐

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s1 = pd.Series(np.random.rand(3), index = ['Jack','Marry','Tom'])
s2 = pd.Series(np.random.rand(3), index = ['Wang','Jack','Marry'])
print(s1+s2)
复制代码
</pre>

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1559806426177" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

python会自耦东识别相同的标签相加，没有相同标签的或者值为空值的相加后为null

Series的增删改

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s1 = pd.Series(np.random.rand(5))
s2 = pd.Series(np.random.rand(5), index = list('ngjur'))

增

s2['a']=100
s1[5]=30

删

s1.drop(1)
s2.drop('n')

改

s2['a']=1
复制代码
</pre>

DataFrame的五种创建方式

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#方法一由list组成的字典
data={'a':[1,2,3],
'b':np.random.rand(3)
}
df=pd.DataFrame(data,index=['one','two','three'])

注意index的个数必须与行数相等，columns的个数可以任意，多出来的系统会默认是空值

方法二由Series组成的字典生成

data={'one':np.Series(np.random.rand(3)),
'two':np.random.rand(2)
}
df=pd.DateFrame(data,index=[1,2,3])

方法三由二维数组生成

data=np.random.rand(9).reshape(3,3)
df=pd.DataFrame(data)

方法四由字典生成

data = [{'one': 1, 'two': 2}, {'one': 5, 'two': 10, 'three': 20}]
df1 = pd.DataFrame(data)

方法五由字典组成的字典

data={'key1':{'math':45,'eng':56,'art':65},
'key2': {'math':23,'eng':12,'art':87}
}
df1 = pd.DataFrame(data)

第一个字典键key1是列，里面的字典的键为行索引

df1.index #dataframe的行索引
df.columns #dataframe的列索引
df.values #dataframe的元素
复制代码
</pre>

pandas的行列选择，切片和布尔索引

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df=pd.DataFrame(np.random.rand(16).reshape(4,4),
columns=list('ABCD'),index=['one','two','three','four'])

选择列

df['A']#直接用列名进行索引
df[['A','C']]
df['A':'C']#不可以这么用

一个列返回的是Series 两个返回的是dataframe

选择行

df.loc['one':'three','a':'b']
df.iloc[1:3,'a']
df.ix[1:3,'b']

布尔索引

print(df>20)
print(df[df>20])
print(df[df>20][['a','b']])#a,b列>20的值,也是多重索引。在df>20的dataframe下再次索引
复制代码
</pre>

ix,loc,iloc的区别，iloc只能通过行号来获取数据，不能是字符。ix / loc 可以通过行号和行标签进行索引,但是iloc的效率是最高的

思维引导dataframe你可以认为他是由很多个Series构成的，所以他的很多用法跟Series类似，其中每一个列他的数据类型就是Series类型如果你要索引两个值是是df[['A','B']] 那么接下来dataframe的增删改也可以说跟Series类似的了

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])

增

df['e']=10#增列
df.loc[4]=1#增行
print(df)

改

df[['a','c']]=100#改列
df.iloc[3]=1#改行

删除

del df['a']#原数组发生改变
print(df.drop(['b','c'],axis=1))
print(df.drop(0))
print(df.drop([1,2]))

注意drop函数执行成功时，原来的df并不会发生改变也是就说他生成的新的dataframe，如果要改变df，则应该加个参数inplace比如

df.drop(['a'],axis=1,inpalce=True)
复制代码
</pre>

dataframe的常用函数

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#unique()
df['列名'].unique() 返回没有重复元素的列

排序sort_values()

df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df1)

print(df1.sort_values(['a'], ascending = True)) # 升序

print(df1.sort_values(['a','c'],ascending=False))#降序，默认是升序

索引排序 sort_index

df1.sort_index(ascending=True,inplace=True)

看了这么多例子，可以直接就知道，ascending就是排序的参数，inplace就是是否在原数据上操作，false的话返回的是新的dataframe

value_counts()统计重复元素的个数

df1.value_counts()
复制代码
</pre>

时间戳

主要掌握datetime，timestamp,datetimeindex,Periods,时间序列的索引与重采样对于电商或者金融方面很多数据的索引都是时间，索引掌握哈时间戳很关键

datetime主要掌握datetime.date(),datetime.datetime(),datetime.timetelta()

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import datetime
print(datetime.date.today())#输出当前时间
print(datetime.date(2019,1,2))#输出自定义时间

datetime.datetime

t1=datetime.datetime(2018,3,5)
t2=datetime.datetime(2017,2,14,15,13,45)
print(t1,t2)

与datetime.date的区别是datetime 能输出分秒

datetime.timedelta()时间差

today=datetime.date(2016,2,5)
yestaday=today-datetime.timedelta(1)
print(today,yestaday)

日期解析，把字符串准成日期格式

from dateutil.parser import parser
t='2018/2/23'
date=parser(t)
print(t)
复制代码
</pre>

timestamp

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># pd.Timestamp可以也精确到分秒
t1=pd.Timestamp('2017-3-2')
t2=pd.Timestamp('1919-2-4')
print(t1,t2)

pd.to_datetime

date=pd.to_datetime('2014-1-2')
print(type(date))
date_index=pd.to_datetime(['2012/2/2','2013/2/3'])
print(type(date_index)

注意，只有一个时间时为Timestamp类型，两个或者两个以上为DatetimeIndex类型

当你第一次接触新的数据类型的时候，输出他的数据类型有助于你理解他

date = ['2017-2-1','2017-2-2','2017-2-3','hello world!','2017-2-5','2017-2-6']
t1=pd.to_datetime(date,error='ignore')
t2=pd.to_datetime(date,error='coerce')
print(type(t1))#类型为ndarry类型
print(type(t2))#类型为DatetimeIndex类型

参数的意思第一个是忽略错误，所以返回的是原来的数据类型，

第一个是coecer，强制的意思，把它强制转为DatetimeIndex类型

复制代码
</pre>

DatetimeIndex

用时间作为索引

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">rng = pd.DatetimeIndex(['12/1/2017','12/2/2017','12/3/2017','12/4/2017','12/5/2017'])
data=pd.Series(np.random.rand(5),index=rng)
复制代码
</pre>

pd.date_range生成日期范围

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 生成日期范围
'''
pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
start:开始时间
end: 结束时间
periods: 偏移量
freq:频率
ts:时间
normalize:时间正则化到午夜时间戳
closed:区间，默认是左右闭用法 closed=left or right
'''
t1=pd.date_range('2017/2/1','2017/3/2')

print(t1)#显示2/1到3/2 的时间，频率默认是天

t2=pd.date_range(start='2017/2/1',periods=10)
t3=pd.date_range(end='2017/2/1',periods=10)
t4=pd.date_range(start='2017/2',periods=10,freq='m')#输出结果
'''
'2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31',
'2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30',
'2017-10-31', '2017-11-30'],
'''
复制代码
</pre>

下面对于金融学的作用较大，讲freq是各种不同参数与用法

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.date_range('2015/2/5','2015/2/6',frep='D')
'''
freq是默认是值以天为频率
参数值先有
'M':month
'Y':year
'T' or 'MIN':分钟
'S':second 秒
'L':毫秒
'U':微秒
'''

freq 其他参数值

'''
'W-MON':指定每月的哪个星期开始
'WOM-2MON':指定每个月的第几个星期(这里是第二个星期)
'''

例子

print(pd.date_range('2018/1/1','2018/2/1',freq='W-MON'))#一星期为间隔
print(pd.date_range('2018/1/1','2018/5/1',freq='WOM-2MON'))#以月为间隔
'''
M:每个月最后一个日历日
Q-DEC:Q-月,指定月为季度末，每个季度末最后一月的最后一个日历日
1-4-7-10 2-5-8-11
A-DEC:每年指定月份的最后一个日历日
'''
print(pd.date_range(2017','2018', freq = 'M'))
print(pd.date_range('2017','2020', freq = 'Q-DEC'))
print(pd.date_range('2017','2018', freq = 'A-DEC'))

分别以月季度年为间隔

复制代码
</pre>

''' print(pd.date_range('2017','2018', freq = 'BMS'))

print(pd.date_range('2017','2020', freq = 'BQS-DEC'))

print(pd.date_range('2017','2020', freq = 'BAS-DEC')) #输出相应频率的工作日 '''

总结

freq 的三个参数，B,(M,Q,A),S.分别代表了工作日，（以月为频率，以季度为频率，以年为频率），（最接近月初的那一天）

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#l频率的转换
pd.asfreq()
ts=pd.data_range('2018','2019')
r=ts.asfreq('4H',method='ffill')

asfreq：频率转换

p = pd.Period('2017','A-DEC')
print(p)
print(p.asfreq('M', how = 'start')) # 也可写 how = 's'
print(p.asfreq('D', how = 'end')) # 也可写 how = 'e'

通过.asfreq(freq, method=None, how=None)方法转换成别的频率

复制代码
</pre>

pandas period

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">p=pd.Period('2017',freq='M')
p+1 #向前前进一频率
pd.date_range('2017','2018',freq='M')
pd.period_range('2017',periods=10,freq='M')

时间范围和时期范围，时期范围精确度小

复制代码

pd.to_period()、pd.to_timestamp()

p=pd.date_range('2017','2018',freq='M')
p1=pd.period_range('2019',periods=6,freq='M')
print(p)
print(p.to_period())
print(p1.to_timestamp())

转换成period和timestamp类型，相同精度的类型不能相互转换

复制代码
</pre>

切片和索引

时间类型的切片和索引的用法和列表的类似

重采样

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(12), index = rng)
re_ts=ts.resample('5D').sum()
print(ts.resample('5D').mean(),'→ 求平均值\n')
print(ts.resample('5D').max(),'→ 求最大值\n')
print(ts.resample('5D').min(),'→ 求最小值\n')
print(ts.resample('5D').median(),'→ 求中值\n')
print(ts.resample('5D').first(),'→ 返回第一个值\n')
print(ts.resample('5D').last(),'→ 返回最后一个值\n')
print(ts.resample('5D').ohlc(),'→ OHLC重采样\n')

OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘

相当于改变频率然接聚合函数，与groupby类似
pd.resmaple(closed,label)的参数
print(ts.resample('5D',closed='left')#区间分布[1,2,3,4,5],[6,7,8,9,10],[11,12]
print(ts.resample('5D',closed='right')

right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]

复制代码
计算增长比
df=pd.DataFrame(np.arange(0,16).reshape(4,4),index=pd.date_range('2017/1/1','2017/1/04'),columns=list('ABCD'))
ts=df.shift(-1)#往上移动一位
print(df/ts)
per=df/df.shift(1)-1
print(per.dropna()
复制代码
</pre>

网友评论

本文标题：用python进行数据分析 pandas！

本文链接：https://www.haomeiwen.com/subject/jhluxctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

用python进行数据分析 pandas！

数组创建

用标量创建

标量的个数由idnex的个数决定

布尔索引

布尔索引的作用可以用于筛选，返回的是布尔类型

python会自耦东识别相同的标签相加，没有相同标签的或者值为空值的相加后为null

增

删

改

注意index的个数必须与行数相等，columns的个数可以任意，多出来的系统会默认是空值

方法二由Series组成的字典生成

方法三 由二维数组生成

方法四 由字典生成

方法五 由字典组成的字典

第一个字典键key1是列，里面的字典的键为行索引

选择列

一个列返回的是Series 两个返回的是dataframe

选择行

布尔索引

ix,loc,iloc的区别，iloc只能通过行号来获取数据，不能是字符。ix / loc 可以通过行号和行标签进行索引,但是iloc的效率 是最高的

增

改

删除

注意drop函数执行成功时，原来的df并不会发生改变也是就说他生成的新的dataframe，如果要改变df，则应该加个参数inplace比如

排序sort_values()

print(df1.sort_values(['a'], ascending = True)) # 升序

索引排序 sort_index

看了这么多例子，可以直接就知道，ascending就是排序的参数，inplace就是是否在原数据上操作，false的话返回的是新的dataframe

value_counts()统计重复元素的个数

datetime.datetime

与datetime.date的区别是datetime 能输出分秒

datetime.timedelta()时间差

日期解析，把字符串准成日期格式

pd.to_datetime

注意，只有一个时间时为Timestamp类型，两个或者两个以上为DatetimeIndex类型

当你第一次接触新的数据类型的时候，输出他的数据类型有助于你理解他

参数的意思第一个是忽略错误，所以返回的是原来的数据类型，

第一个是coecer，强制的意思，把它强制转为DatetimeIndex类型

print(t1)#显示2/1到3/2 的时间，频率默认是天

freq 其他参数值

例子

分别以月季度年为间隔

asfreq：频率转换

通过.asfreq(freq, method=None, how=None)方法转换成别的频率

时间范围和时期范围，时期范围精确度小

pd.to_period()、pd.to_timestamp()

转换成period和timestamp类型，相同精度的类型不能相互转换

OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘

right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

方法三由二维数组生成

方法四由字典生成

方法五由字典组成的字典

ix,loc,iloc的区别，iloc只能通过行号来获取数据，不能是字符。ix / loc 可以通过行号和行标签进行索引,但是iloc的效率是最高的