Python数据分析笔记-06

作者: 杨大菲 | 来源:发表于2018-02-10 23:18 被阅读0次

Python 学习笔记第一篇：matplotlib 绘制图形
利用Python进行数据分析 - 第三章 ipython
Python 数据分析学习笔记： numpy 篇
Python数据分析笔记-06
（十）关联分析
利用Python进行数据分析 - 准备工作
python3.6 pandas 数据规整-如何合并数据集，转换
pandas索引取数
《利用Python进行数据分析》读书笔记
会员数据化

1.Series对象的组成元素

1）查看对象元素

2）统计重复元素个数

3）判断一个元素是否在对象中

>>> import pandas as pd #导入pandas包取别名pd

>>> s=pd.Series([1,1,2,2,3,3,4,4,5],index=['a','b','b','c','d','d','e','f','f']) #定义一个series对象s

>>> s

a 1

b 1

b 2

c 2

d 3

e 4

f 4

f 5

dtype: int64

>>> s.unique()#利用unique（）函数获取s对象中去重后的元素列表

array([1, 2, 3, 4, 5], dtype=int64)

>>> s.value_counts()#利用value_counts()函数获取s对象中去重后的元素及其重复次数

4 2

3 2

2 2

1 2

5 1

dtype: int64

>>> s.isin([1,3])#利用isin()函数获取函数元素是否再s对象中

a True

b True

b False

c False

d True

e False

f False

dtype: bool

>>> s(s.isin([1,3]))

Traceback (most recent call last):

File "", line 1, in

TypeError: 'Series' object is not callable

>>> s[s.isin([1,3])]

a 1

b 1

d 3

dtype: int64

2.NaN

1）再pandas中可以定义这种类型的数据，把他添加到series等数据结构中，创建数据结构式，可谓数字中元素缺失的项输入np.NaN

>>> import pandas as pd

>>> import numpy as ny

>>> s=pd.Series([1,2,ny.NaN])

>>> s

0 1.0

1 2.0

2 NaN

dtype: float64

2）可利用isnull和notnull函数识别Series中是否又NaN元素，也可以将函数结果作为Series对象的筛选项

>>> import pandas as pd

>>> import numpy as ny

>>> s=pd.Series([1,2,ny.NaN])

>>> s

0 1.0

1 2.0

2 NaN

dtype: float64

>>> s.isnull()

0 False

1 False

2 True

dtype: bool

>>> s.notnull()

0 True

1 True

2 False

dtype: bool

>>> s[s.isnull()]

2 NaN

dtype: float64

3.Series用作字典

1）可以用定义好的字典来创建Series对象

>>> import pandas as pd

>>> dic={'a':1,'b':2,'c':3}

>>> dic

{'b': 2, 'a': 1, 'c': 3}

>>> s=pd.Series(dic)

>>> s

a 1

b 2

c 3

dtype: int64

2）上面的索引数组用字典的键来填充，每个索引对应的元素为用作索引的键在字典中对应的值，还可以单独指定索引，pandas会控制字典的键和数组索引标签之间的相关性，如果遇到单独指定索引在标签中找不到对应的键的时候，会为这个索引指定一个NaN元素作为值

>>> import pandas as pd

>>> dic={'red':1,'white':4,'yellow':7,'blue':6}

>>> dic

{'red': 1, 'blue': 6, 'white': 4, 'yellow': 7}

>>> colors=['red','white','blue','black','pink']

>>> colors

['red', 'white', 'blue', 'black', 'pink']

>>> s=pd.Series(dic,index=colors)

>>> s

red 1.0

white 4.0

blue 6.0

black NaN

pink NaN

dtype: float64

4.Series对象之间的运算

两个Series对象之间运算只对索引相同的元素进行想加，其余只在一个对象中的索引在相加后值为NaN

1）数字天然索引

>>> import pandas as pd

>>> s=pd.Series([1,2,3,4,5,6])

>>> s

0 1

1 2

2 3

3 4

4 5

5 6

dtype: int64

>>> s2=pd.Series([2,4,6,8])

>>> s2

0 2

1 4

2 6

3 8

dtype: int64

>>> s+s2

0 3.0

1 6.0

2 9.0

3 12.0

4 NaN

5 NaN

dtype: float64

2）自定义索引

>>> import pandas as pd

>>> s=pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])

>>> s

a 1

b 2

c 3

d 4

e 5

dtype: int64

>>> s2=pd.Series([2,3,4,7],index=['b','a','e','f'])

>>> s2

b 2

a 3

e 4

f 7

dtype: int64

>>> s+s2

a 4.0

b 4.0

c NaN

d NaN

e 9.0

f NaN

dtype: float64

网友评论

我爱编程

本文标题：Python数据分析笔记-06

本文链接：https://www.haomeiwen.com/subject/lpnhtftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python数据分析笔记-06

相关文章

Python 学习笔记第一篇：matplotlib 绘制图形

利用Python进行数据分析 - 第三章 ipython

Python 数据分析学习笔记： numpy 篇