pandas二刷（2）

作者: 山猪打不过家猪 | 来源:发表于2023-02-12 08:45 被阅读0次

Pandas
Python（金融）数据分析（二）Pandas
科学计算库pandas执行示例
pandas
pandas 2
2018-10-20
pandas 使用
pandas玩转Excel01-如何创建Excel文件
2021-12-31 Python-23
2020-02-12

1.8字符串处理

使用方法：先获取Series的str属性，然后在属性上调用函数；
只能在字符串列上使用，不能数字列上使用；
Dataframe上没有str属性和处理方法
Series.str并不是Python原生字符串，而是自己的一套方法，不过大部分和原生str很相似；

1.8.1 startwith开头

查询字符串以2018-03开头的，等同于查询月份数据

condition = df["ymd"].str.startswith("2018-03")

1.8.2 链式调用清洗字符串

将字符串2018-01-02转换成为201801

df["ymd"].str.replace("-", "").str[0:6]

1.8.3 正则表达式

contains

找出有晴天的所有数据

df[df['tianqi'].str.contains("晴")]

re表达式

提取所有数字

df['ymd'] = df['ymd'].str.replace(re.compile("-"), "")

image.png

1.10 Merge Syntax

Merge相当于sql的Join语法，将不同的Key关联到一个表

Table 1：

df_ratings = pd.read_csv(
    "./datas/movielens-1m/ratings.dat", 
    sep="::",
    engine='python', 
    names="UserID::MovieID::Rating::Timestamp".split("::")
)

image.png

Table 2：

df_users = pd.read_csv(
    "./datas/movielens-1m/users.dat", 
    sep="::",
    engine='python', 
    names="UserID::Gender::Age::Occupation::Zip-code".split("::")
)

image.png

Table 3：

df_movies = pd.read_csv(
    "./datas/movielens-1m/movies.dat", 
    sep="::",
    engine='python', 
    names="MovieID::Title::Genres".split("::")
)

image.png

1.10.1 inner join

All entries from the left side will appear in the result, and if there's no match from the right side, it will be shown as Null.

df_ratings_users = pd.merge(
   df_ratings, df_users, left_on="UserID", right_on="UserID", how="inner"
)

equal to sql

select * from df_ratings a inner join df_users  b on a.UserID= b.UserID

image.png

1.10.2 right join

All entries from the left side will appear in the result, and if there's no match from the right side, it will be shown as Null

df_ratings_users = pd.merge(
   df_ratings, df_users, left_on="UserID", right_on="UserID", how="left "
)

equal to sql

select * from df_ratings a left join df_users  b on a.UserID= b.UserID

1.11 Concat

concat is used to combine excel with the same format

table 1

image.png
table 2

image.png

1.11.1 use pandas.concat to combine data

use default param

pd.concat([table1,table2])

image.png

user ignore_index =True to neglect the primary index

pd.concat([df1,df2], ignore_index=True)

image.png

1.12 Batch merge and split excel

1.12.1 split to multiple equal excel

work_dir="./course_datas/c15_excel_split_merge"
splits_dir=f"{work_dir}/splits"

import os
if not os.path.exists(splits_dir):
    os.mkdir(splits_dir)

网友评论

本文标题：pandas二刷（2）

本文链接：https://www.haomeiwen.com/subject/ubtdkdtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

pandas二刷（2）

1.8字符串处理

1.8.1 startwith开头

1.8.2 链式调用清洗字符串

1.8.3 正则表达式

contains

re表达式

1.10 Merge Syntax

1.10.1 inner join

1.10.2 right join

1.11 Concat

1.11.1 use pandas.concat to combine data

1.12 Batch merge and split excel

1.12.1 split to multiple equal excel

相关文章

Pandas

Python（金融）数据分析（二）Pandas

科学计算库pandas执行示例

pandas

pandas 2

2018-10-20

pandas 使用

pandas玩转Excel01-如何创建Excel文件

2021-12-31 Python-23

2020-02-12

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读