美文网首页
pandas二刷(2)

pandas二刷(2)

作者: 山猪打不过家猪 | 来源:发表于2023-02-12 08:45 被阅读0次

    1.8字符串处理

    1. 使用方法:先获取Series的str属性,然后在属性上调用函数;
    2. 只能在字符串列上使用,不能数字列上使用;
    3. Dataframe上没有str属性和处理方法
    4. Series.str并不是Python原生字符串,而是自己的一套方法,不过大部分和原生str很相似;
    1.8.1 startwith开头

    查询字符串以2018-03开头的,等同于查询月份数据

    condition = df["ymd"].str.startswith("2018-03")
    
    1.8.2 链式调用清洗字符串

    将字符串2018-01-02转换成为201801

    df["ymd"].str.replace("-", "").str[0:6]
    
    1.8.3 正则表达式
    contains
    • 找出有晴天的所有数据
    df[df['tianqi'].str.contains("晴")]
    
    re表达式
    • 提取所有数字
    df['ymd'] = df['ymd'].str.replace(re.compile("-"), "")
    
    image.png

    1.10 Merge Syntax

    Merge相当于sql的Join语法,将不同的Key关联到一个表

    • Table 1:
    df_ratings = pd.read_csv(
        "./datas/movielens-1m/ratings.dat", 
        sep="::",
        engine='python', 
        names="UserID::MovieID::Rating::Timestamp".split("::")
    )
    
    image.png
    • Table 2:
    df_users = pd.read_csv(
        "./datas/movielens-1m/users.dat", 
        sep="::",
        engine='python', 
        names="UserID::Gender::Age::Occupation::Zip-code".split("::")
    )
    
    image.png
    • Table 3:
    df_movies = pd.read_csv(
        "./datas/movielens-1m/movies.dat", 
        sep="::",
        engine='python', 
        names="MovieID::Title::Genres".split("::")
    )
    
    image.png
    1.10.1 inner join
    • All entries from the left side will appear in the result, and if there's no match from the right side, it will be shown as Null.
    df_ratings_users = pd.merge(
       df_ratings, df_users, left_on="UserID", right_on="UserID", how="inner"
    )
    
    • equal to sql
    select * from df_ratings a inner join df_users  b on a.UserID= b.UserID
    
    image.png
    1.10.2 right join
    • All entries from the left side will appear in the result, and if there's no match from the right side, it will be shown as Null
    df_ratings_users = pd.merge(
       df_ratings, df_users, left_on="UserID", right_on="UserID", how="left "
    )
    
    • equal to sql
    select * from df_ratings a left join df_users  b on a.UserID= b.UserID
    

    1.11 Concat

    concat is used to combine excel with the same format

    • table 1


      image.png
    • table 2


      image.png
    1.11.1 use pandas.concat to combine data
    • use default param
    pd.concat([table1,table2])
    
    image.png
    image.png
    • user ignore_index =True to neglect the primary index
    pd.concat([df1,df2], ignore_index=True)
    
    image.png

    1.12 Batch merge and split excel

    1.12.1 split to multiple equal excel
    work_dir="./course_datas/c15_excel_split_merge"
    splits_dir=f"{work_dir}/splits"
    
    import os
    if not os.path.exists(splits_dir):
        os.mkdir(splits_dir)
    

    相关文章

      网友评论

          本文标题:pandas二刷(2)

          本文链接:https://www.haomeiwen.com/subject/ubtdkdtx.html