Learn_for_Pandas

作者: 进击的STE | 来源:发表于2018-07-13 22:54 被阅读0次

Learn_for_Pandas

1.Series

Series类似于一维向量组，只不过包含了value和index，如

import pandas as pd
obj = pd.Series([4, 7, -5, 3])
#obj = 
0   4
1   7
2   -5
3   3
obj.values#返回Series的值
obj.index#返回索引值
#可以根据index进行索引
obj[x]或者obj[[x1, x2,..., x3]]#x可以是整数型也可以为字符串
#遮罩层
obj[obj > value]，过滤得到Series中值大于特定value的Series,不改变obj本身值

#可以将Python中的字典dict直接转换为Series
data = {key1:value1, key2:value2, ..}
obj = pd.Series(data)#obj中的index为key，value为字典中的value
pd.isnull()和pd.notnull()处理nan值
#####Series和Series的index都有name对象
obj.name = xxx
obj.index.name = xxx
#####Series的index可以被赋值
obj.index = [str1/int1, ...]

创建Series

# 1
import pandas as pd
countries = ['中国', '美国', '澳大利亚']
countries_s = pd.Series(countries)
# 2
country_dicts = {'CH': '中国',
                 'US': '美国',
                 'AU': '澳大利亚'}
country_dict_S = pd.Series(country_dicts)
#给索引命名
country_dict_s.index.name = 'Code'
#给数据命名
country_dict_s.name = 'Country'

处理缺失数据
字符串类型处理为None，数据类型处理为Nan

Series 索引
通过索引判断数据是否存在，使用in iloc 按index值索引，从0开始，如iloc[0]; loc 按key值进行索引，如loc['key']或者直接['key’]索引也可以同时操作多个 xxx.iloc[[0, 2, 5]]#例子 xxx.loc[['a', 'b']]#

向量化操作

2.DataFrame

可以简单理解为DataFrame是包含行和列索引的矩阵。
当索引index（包含行和列）大于value值得个数，对应的值填为nan；
当value多于index的话，index决定DataFrame最后的形状

列索引与行索引

列索引：df.columns_index或者df['columns_index']
行索引：df.loc['rows_index']或者df.iloc[int_num]
混合索引：df.columns_index.loc['rows_index']/ df.columns_index.iloc[int_num]/ df.columns_index['rows_index'] #先列后行 df['columns_index']XXX等价
df.loc[].columns_index /df.loc[]['columns_index']#先行后列 df.iloc[int_num]XXX等价

del 以删除字典的方式删除指定的一列

创建新的column时不能采用df.columns_index,只能采用df['columns_index'] = xx
删除列同样也只能采用del df['columns_index']方式
注意： 创建或者删除colums是在原df种立即生效的，建议在操作df前，使用copy备份一份
####### 当data是一个包含字典的字典，DataFrame的处理
dicts = {key1:{sub_key1:value1, sub1_key1:value2}, key2:{sub_key2:value1, sub1_key2:value2, sub2_key2:value3}}
pandas处理这样的数据时，将外层的key作为columns，内层的key作为rows, 对于对应index不存在的value默认填充为nan
####### dataframe 可以为index和columns均设置名称
df.index.name = str1
df.columns.name = str2
不同于Python的set，pandas可以使用重复的标签
Index的一些方法
方法|描述
append| 在已有的index增加新的index，并返回
difference| index1.difference(index2)，返回index1中不在index2中的元素
intersection| index1.intersection(index2)，返回index1与index2的交集
union|index1.intersection(index2)，返回index1与index2的并集
isin|index1.isin(index2),index1中对应的元素存在于index2中，则index1对应位置为True，否则为False
delete| index.delete(loc)，删除对应位置的元素
drop|index.drop(labels)
insert|index.insert(loc, value)
is_monotonic|
is_unique|index.is_unique 如果元素唯一，返回True，否则False
unique| index.unique()，返回唯一元素，类似于set

网友评论

本文标题：Learn_for_Pandas

本文链接：https://www.haomeiwen.com/subject/ldxepftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Learn_for_Pandas

Learn_for_Pandas

1.Series

2.DataFrame

列索引与行索引

del 以删除字典的方式删除指定的一列

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读