pandas学习-2

作者: 蓝剑狼 | 来源:发表于2018-08-12 21:19 被阅读13次

pandas学习-2
pandas学习2
Pandas基础之文件读取
Pandas笔记1-导入csv文件
科学计算库pandas执行示例
pandas库学习(一) Series
pandas
pandas学习笔记(2)
pandas学习笔记（2）
pandas学习笔记-2

Pandas数据结构Series：索引

位置下标 / 标签索引 / 切片索引 / 布尔型索引

# 位置下标，类似序列

s = pd.Series(np.random.rand(5))
print(1,'-'*30)
print(s)
print(2,'-'*30)
print(s[0],type(s[0]),s[0].dtype)
print(3,'-'*30)
print(float(s[0]),type(float(s[0])))
#print(s[-1])
print(4,'-'*30)
print(s[-1:])
# 位置下标从0开始
# 输出结果为numpy.float格式，
# 可以通过float()函数转换为python float格式
# numpy.float与float占用字节不同
# s[-1]结果如何？
#运行结果
1 ------------------------------
0    0.616945
1    0.827078
2    0.749206
3    0.743930
4    0.012521
dtype: float64
2 ------------------------------
0.6169452488328007 <class 'numpy.float64'> float64
3 ------------------------------
0.6169452488328007 <class 'float'>
4 ------------------------------
4    0.012521
dtype: float64

# 标签索引

s = pd.Series(np.random.rand(5), index = ['a','b','c','d','e'])
print(1,'-'*30)
print(s)
print(2,'-'*30)
print(s['a'],type(s['a']),s['a'].dtype)
# 方法类似下标索引，用[]表示，内写上index，注意index是字符串

sci = s[['a','b','e']]
print(3,'-'*30)
print(sci,type(sci))
# 如果需要选择多个标签的值，用[[]]来表示（相当于[]中包含一个列表）
# 多标签索引结果是新的数组
#执行结果
1 ------------------------------
a    0.446335
b    0.113276
c    0.627803
d    0.690862
e    0.523690
dtype: float64
2 ------------------------------
0.44633480504291745 <class 'numpy.float64'> float64
3 ------------------------------
a    0.446335
b    0.113276
e    0.523690
dtype: float64 <class 'pandas.core.series.Series'>

# 切片索引

s1 = pd.Series(np.random.rand(5))
s2 = pd.Series(np.random.rand(5), index = ['a','b','c','d','e'])
print(1,'-'*30)
print(s1[1:4],s1[4])
print(2,'-'*30)
print(s2['a':'c'],s2['c'])
print(3,'-'*30)
print(s2[0:3],s2[3])
# 注意：用index做切片是末端包含
print(4,'-'*30)
print(s2[:-1])
print(5,'-'*30)
print(s2[::2])
# 下标索引做切片，和list写法一样
# 执行结果
1 ------------------------------
1    0.377927
2    0.021156
3    0.114238
dtype: float64 0.7374167699045282
2 ------------------------------
a    0.235294
b    0.793877
c    0.389438
dtype: float64 0.38943809032191234
3 ------------------------------
a    0.235294
b    0.793877
c    0.389438
dtype: float64 0.09342834499831487
4 ------------------------------
a    0.235294
b    0.793877
c    0.389438
d    0.093428
dtype: float64
5 ------------------------------
a    0.235294
c    0.389438
e    0.437527
dtype: float64

# 布尔型索引

s = pd.Series(np.random.rand(3)*100)
s[4] = None  # 添加一个空值
print("1".center(40,'*'))
print(s)
bs1 = s > 50
bs2 = s.isnull()
bs3 = s.notnull()
print("2".center(40,'*'))
print(bs1, type(bs1), bs1.dtype)
print("3".center(40,'*'))
print(bs2, type(bs2), bs2.dtype)
print("4".center(40,'*'))
print(bs3, type(bs3), bs3.dtype)
# 数组做判断之后，返回的是一个由布尔值组成的新的数组
# .isnull() / .notnull() 判断是否为空值 (None代表空值，NaN代表有问题的数值，两个都会识别为空值)
print("5".center(40,'*'))
print(s[s > 50])
print("6".center(40,'*'))
print(s[bs3])
# 布尔型索引方法：用[判断条件]表示，其中判断条件可以是 一个语句，或者是 一个布尔型数组！
#执行结果
*******************1********************
0    71.6283
1     28.596
2    92.4212
4       None
dtype: object
*******************2********************
0     True
1    False
2     True
4    False
dtype: bool <class 'pandas.core.series.Series'> bool
*******************3********************
0    False
1    False
2    False
4     True
dtype: bool <class 'pandas.core.series.Series'> bool
*******************4********************
0     True
1     True
2     True
4    False
dtype: bool <class 'pandas.core.series.Series'> bool
*******************5********************
0    71.6283
2    92.4212
dtype: object
*******************6********************
0    71.6283
1     28.596
2    92.4212
dtype: object