用python进行数据分析 pandas!

作者: 14e61d025165 | 来源:发表于2019-06-06 15:35 被阅读0次

Series的创建

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#字典创建
a={'a':1 ,'b':2 , 'c':3}
S=pd.Series(a)

数组创建

S1=pd.Series(np.random.randn(4))

用标量创建

S2=pd.Series(10,index=range(4))

标量的个数由idnex的个数决定

复制代码
</pre>

Series的索引

** Python学习交流群:1004391443,这里有资源共享,技术解答,还有小编从最基础的Python资料到项目实战的学习资料都有整理,希望能帮助你更了解python,学习python**

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s = pd.Series(np.random.rand(5)*100, index = ['a','b','c','d','e'])
print(s[['a','c','e']])#选取自己想要的值
print(s[1:3])#左闭右开
print(s['a':'c'])#都是闭区间
print(s[-1])#倒过来
print(s[::2])#步长为2

布尔索引

bs1=s>50
bs2=s.isnull()
bs3=s.notnull()
S2=s[s.notnull()]#输出结果是输出S2不是null的值

布尔索引的作用可以用于筛选,返回的是布尔类型

复制代码
</pre>

Series的常用函数

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s = pd.Series(np.random.rand(3), index = ['a','b','c'])
s.head()#查看前五条数据
s.tail()#查看后五条
s1=s.reindex(['b','a','c','d'],fill_value=0)#reindex新加的索引行为空,fill_value参数是把空值填充为0
复制代码
</pre>

Series的自动对齐

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s1 = pd.Series(np.random.rand(3), index = ['Jack','Marry','Tom'])
s2 = pd.Series(np.random.rand(3), index = ['Wang','Jack','Marry'])
print(s1+s2)
复制代码
</pre>

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1559806426177" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

python会自耦东识别相同的标签相加,没有相同标签的或者值为空值的相加后为null

Series的增删改

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">s1 = pd.Series(np.random.rand(5))
s2 = pd.Series(np.random.rand(5), index = list('ngjur'))

s2['a']=100
s1[5]=30

s1.drop(1)
s2.drop('n')

s2['a']=1
复制代码
</pre>

DataFrame的五种创建方式

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#方法一 由list组成的字典
data={'a':[1,2,3],
'b':np.random.rand(3)
}
df=pd.DataFrame(data,index=['one','two','three'])

注意index的个数必须与行数相等,columns的个数可以任意,多出来的系统会默认是空值

方法二由Series组成的字典生成

data={'one':np.Series(np.random.rand(3)),
'two':np.random.rand(2)
}
df=pd.DateFrame(data,index=[1,2,3])

方法三 由二维数组生成

data=np.random.rand(9).reshape(3,3)
df=pd.DataFrame(data)

方法四 由字典生成

data = [{'one': 1, 'two': 2}, {'one': 5, 'two': 10, 'three': 20}]
df1 = pd.DataFrame(data)

方法五 由字典组成的字典

data={'key1':{'math':45,'eng':56,'art':65},
'key2': {'math':23,'eng':12,'art':87}
}
df1 = pd.DataFrame(data)

第一个字典键key1是列,里面的字典的键为行索引

df1.index #dataframe的行索引
df.columns #dataframe的列索引
df.values #dataframe的元素
复制代码
</pre>

pandas的行列选择,切片和布尔索引

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df=pd.DataFrame(np.random.rand(16).reshape(4,4),
columns=list('ABCD'),index=['one','two','three','four'])

选择列

df['A']#直接用列名进行索引
df[['A','C']]
df['A':'C']#不可以这么用

一个列返回的是Series 两个返回的是dataframe

选择行

df.loc['one':'three','a':'b']
df.iloc[1:3,'a']
df.ix[1:3,'b']

布尔索引

print(df>20)
print(df[df>20])
print(df[df>20][['a','b']])#a,b列>20的值,也是多重索引。在df>20的dataframe下再次索引
复制代码
</pre>

ix,loc,iloc的区别,iloc只能通过行号来获取数据,不能是字符。ix / loc 可以通过行号和行标签进行索引,但是iloc的效率 是最高的

思维引导dataframe你可以认为他是由很多个Series构成的,所以他的很多用法跟Series类似, 其中每一个列他的数据类型就是Series类型 如果你要索引两个值是是df[['A','B']] 那么接下来dataframe的增删改也可以说跟Series类似的了

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])

df['e']=10#增列
df.loc[4]=1#增行
print(df)

df[['a','c']]=100#改列
df.iloc[3]=1#改行

删除

del df['a']#原数组发生改变
print(df.drop(['b','c'],axis=1))
print(df.drop(0))
print(df.drop([1,2]))

注意drop函数执行成功时,原来的df并不会发生改变也是就说他生成的新的dataframe,如果要改变df,则应该加个参数inplace比如

df.drop(['a'],axis=1,inpalce=True)
复制代码
</pre>

dataframe的常用函数

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#unique()
df['列名'].unique() 返回没有重复元素的列

排序sort_values()

df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df1)

print(df1.sort_values(['a'], ascending = True)) # 升序

print(df1.sort_values(['a','c'],ascending=False))#降序,默认是升序

索引排序 sort_index

df1.sort_index(ascending=True,inplace=True)

看了这么多例子,可以直接就知道,ascending就是排序的参数,inplace就是是否在原数据上操作,false的话返回的是新的dataframe

value_counts()统计重复元素的个数

df1.value_counts()
复制代码
</pre>

时间戳

主要掌握datetime,timestamp,datetimeindex,Periods,时间序列的索引与重采样 对于电商或者金融方面很多数据的索引都是时间,索引掌握哈时间戳很关键

datetime主要掌握datetime.date(),datetime.datetime(),datetime.timetelta()

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import datetime
print(datetime.date.today())#输出当前时间
print(datetime.date(2019,1,2))#输出自定义时间

datetime.datetime

t1=datetime.datetime(2018,3,5)
t2=datetime.datetime(2017,2,14,15,13,45)
print(t1,t2)

与datetime.date的区别是datetime 能输出分秒

datetime.timedelta()时间差

today=datetime.date(2016,2,5)
yestaday=today-datetime.timedelta(1)
print(today,yestaday)

日期解析,把字符串准成日期格式

from dateutil.parser import parser
t='2018/2/23'
date=parser(t)
print(t)
复制代码
</pre>

timestamp

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># pd.Timestamp可以也精确到分秒
t1=pd.Timestamp('2017-3-2')
t2=pd.Timestamp('1919-2-4')
print(t1,t2)

pd.to_datetime

date=pd.to_datetime('2014-1-2')
print(type(date))
date_index=pd.to_datetime(['2012/2/2','2013/2/3'])
print(type(date_index)

注意,只有一个时间时为Timestamp类型,两个或者两个以上为DatetimeIndex类型

当你第一次接触新的数据类型的时候,输出他的数据类型有助于你理解他

date = ['2017-2-1','2017-2-2','2017-2-3','hello world!','2017-2-5','2017-2-6']
t1=pd.to_datetime(date,error='ignore')
t2=pd.to_datetime(date,error='coerce')
print(type(t1))#类型为ndarry类型
print(type(t2))#类型为DatetimeIndex类型

参数的意思第一个是忽略错误,所以返回的是原来的数据类型,

第一个是coecer,强制的意思,把它强制转为DatetimeIndex类型

复制代码
</pre>

DatetimeIndex

用时间作为索引

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">rng = pd.DatetimeIndex(['12/1/2017','12/2/2017','12/3/2017','12/4/2017','12/5/2017'])
data=pd.Series(np.random.rand(5),index=rng)
复制代码
</pre>

pd.date_range生成日期范围

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 生成日期范围
'''
pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
start:开始时间
end: 结束时间
periods: 偏移量
freq:频率
ts:时间
normalize:时间正则化到午夜时间戳
closed:区间,默认是左右闭 用法 closed=left or right
'''
t1=pd.date_range('2017/2/1','2017/3/2')

print(t1)#显示2/1到3/2 的时间,频率默认是天

t2=pd.date_range(start='2017/2/1',periods=10)
t3=pd.date_range(end='2017/2/1',periods=10)
t4=pd.date_range(start='2017/2',periods=10,freq='m')#输出结果
'''
'2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31',
'2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30',
'2017-10-31', '2017-11-30'],
'''
复制代码
</pre>

下面对于金融学的作用较大,讲freq是各种不同参数与用法

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.date_range('2015/2/5','2015/2/6',frep='D')
'''
freq是默认是值以天为频率
参数值先有
'M':month
'Y':year
'T' or 'MIN':分钟
'S':second 秒
'L':毫秒
'U':微秒
'''

freq 其他参数值

'''
'W-MON':指定每月的哪个星期开始
'WOM-2MON':指定每个月的第几个星期(这里是第二个星期)
'''

例子

print(pd.date_range('2018/1/1','2018/2/1',freq='W-MON'))#一 星期为间隔
print(pd.date_range('2018/1/1','2018/5/1',freq='WOM-2MON'))#以月为间隔
'''
M:每个月最后一个日历日
Q-DEC:Q-月,指定月为季度末,每个季度末最后一月的最后一个日历日
1-4-7-10 2-5-8-11
A-DEC:每年指定月份的最后一个日历日
'''
print(pd.date_range(2017','2018', freq = 'M'))
print(pd.date_range('2017','2020', freq = 'Q-DEC'))
print(pd.date_range('2017','2018', freq = 'A-DEC'))

分别以月季度年为间隔

复制代码
</pre>

''' print(pd.date_range('2017','2018', freq = 'BMS'))

print(pd.date_range('2017','2020', freq = 'BQS-DEC'))

print(pd.date_range('2017','2020', freq = 'BAS-DEC')) #输出相应频率的工作日 '''

总结

freq 的三个参数,B,(M,Q,A),S.分别代表了工作日,(以月为频率,以季度为频率,以年为频率),(最接近月初的那一天)

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#l频率的转换
pd.asfreq()
ts=pd.data_range('2018','2019')
r=ts.asfreq('4H',method='ffill')

asfreq:频率转换

p = pd.Period('2017','A-DEC')
print(p)
print(p.asfreq('M', how = 'start')) # 也可写 how = 's'
print(p.asfreq('D', how = 'end')) # 也可写 how = 'e'

通过.asfreq(freq, method=None, how=None)方法转换成别的频率

复制代码
</pre>

pandas period

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">p=pd.Period('2017',freq='M')
p+1 #向前前进一频率
pd.date_range('2017','2018',freq='M')
pd.period_range('2017',periods=10,freq='M')

时间范围和时期范围,时期范围精确度小

复制代码

pd.to_period()、pd.to_timestamp()

p=pd.date_range('2017','2018',freq='M')
p1=pd.period_range('2019',periods=6,freq='M')
print(p)
print(p.to_period())
print(p1.to_timestamp())

转换成period和timestamp类型,相同精度的类型不能相互转换

复制代码
</pre>

切片和索引

时间类型的切片和索引的用法和列表的类似

重采样

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(12), index = rng)
re_ts=ts.resample('5D').sum()
print(ts.resample('5D').mean(),'→ 求平均值\n')
print(ts.resample('5D').max(),'→ 求最大值\n')
print(ts.resample('5D').min(),'→ 求最小值\n')
print(ts.resample('5D').median(),'→ 求中值\n')
print(ts.resample('5D').first(),'→ 返回第一个值\n')
print(ts.resample('5D').last(),'→ 返回最后一个值\n')
print(ts.resample('5D').ohlc(),'→ OHLC重采样\n')

OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘

相当于改变频率然接聚合函数 ,与groupby类似
pd.resmaple(closed,label)的参数
print(ts.resample('5D',closed='left')#区间分布[1,2,3,4,5],[6,7,8,9,10],[11,12]
print(ts.resample('5D',closed='right')

right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]

复制代码
计算增长比
df=pd.DataFrame(np.arange(0,16).reshape(4,4),index=pd.date_range('2017/1/1','2017/1/04'),columns=list('ABCD'))
ts=df.shift(-1)#往上移动一位
print(df/ts)
per=df/df.shift(1)-1
print(per.dropna()
复制代码
</pre>

相关文章

网友评论

    本文标题:用python进行数据分析 pandas!

    本文链接:https://www.haomeiwen.com/subject/jhluxctx.html