Pandas 学习（1）

作者: 末央酒 | 来源:发表于2017-11-12 14:26 被阅读0次

Pandas 学习

基本命令

导包

import pandas

数据的导入文件

pd.read_csv(filename) # 导入csv格式文件中的数据
pd.read_table(filename) # 导入有分隔符的文本 (如TSV) 中的数据
pd.read_excel(filename) # 导入Excel格式文件中的数据
pd.read_sql(query, connection_object) # 导入SQL数据表/数据库中的数据
pd.read_json(json_string) # 导入JSON格式的字符，URL地址或者文件中的数据
pd.read_html(url) # 导入经过解析的URL地址中包含的数据框 (DataFrame) 数据
pd.read_clipboard() # 导入系统粘贴板里面的数据
pd.DataFrame(dict)  # 导入Python字典 (dict) 里面的数据，其中key是数据框的表头，value是数据框的内容。

数据的检查和查看

查看数据框的前几行
```
chipo.head(n) # n为查看的大小，默认为5
```
查看数据框的后几行
```
chipo.tail(n)
```
查看数据的索引，数据类型及内存信息
```
chipo.info()
```
查看数据的行列大小
```
chipo.shape[1] # shape返回的是一个元祖，第0位为行大小，第1位为列大小
```
查看所有列名返回值为Index(['order_id', 'quantity', 'item_name', 'choice_description',
'item_price'],
dtype='object')
```
chipo.columns
```
查看索引返回值为RangeIndex(start=0, stop=4622, step=1)
```
chipo.index
```
查询每个独特数据值出现次数统计
```
chipo.item_name.value_counts() # 每个商品出现的次数
```

数据的分类

groupby

drinks.groupby('continent').beer_servings.mean() # 将酒销售量按照大洲分类，求每个大洲销售的平均值

合并
- concat()
```
all_data_col = pd.concat([data1, data2], axis = 1) # axis = 1 则是按照列的维度合并，默认按照行的维度合并
```
- merge()
```
pd.merge(all_data, data3, on='subject_id') 
```
常用参数参考自官网

how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
left: use only keys from left frame, similar to a SQL left outer join; preserve key order
right: use only keys from right frame, similar to a SQL right outer join; preserve key order
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys
on : (label or list)
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : (label or list, or array-like)
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : (label or list, or array-like)
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : (boolean, default False)
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : (boolean, default False)
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : (boolean, default False)
Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword)
Apply 聚合函数

apply 是 pandas 库的一个很重要的函数，多和 groupby 函数一起用，也可以直接用于 DataFrame 和 Series 对象。主要用于数据聚合运算，可以很方便的对分组进行现有的运算和自定义的运算。
```
def fix_century(x):
    year = x.year - 100 if x.year > 1989 else x.year
    return datetime.date(year, x.month, x.day)

data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].apply(fix_century)  # 数据中Yr_Mo_Dy 按照fix_century进行替换
```
resample 重新采样函数
Pandas中的resample，重新采样，是对原样本重新处理的一个方法，是一个对常规时间序列数据重新采样和频率转换的便捷的方法。
常用参数

rule : 偏移量（隔多长时间取一次值）
例如：'30s','3T','BM'

网友评论

我爱编程

本文标题：Pandas 学习（1）

本文链接：https://www.haomeiwen.com/subject/bnnnmxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Pandas 学习（1）

Pandas 学习

基本命令

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

我爱编程