美文网首页
Python之pandas文字数据处理

Python之pandas文字数据处理

作者: Brendansmisle | 来源:发表于2020-03-31 14:25 被阅读0次
    1.导入模块
    In [1]: import pandas as pd
    
    2.导入表格数据
    >>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
    >>> titanic
         PassengerId  Survived  Pclass                                                 Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked
    0              1         0       3                              Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S
    1              2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1      0          PC 17599  71.2833   C85        C
    2              3         1       3                               Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S
    3              4         1       1         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S
    4              5         0       3                             Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S
    ..           ...       ...     ...                                                  ...     ...   ...    ...    ...               ...      ...   ...      ...
    886          887         0       2                                Montvila, Rev. Juozas    male  27.0      0      0            211536  13.0000   NaN        S
    887          888         1       1                         Graham, Miss. Margaret Edith  female  19.0      0      0            112053  30.0000   B42        S
    888          889         0       3             Johnston, Miss. Catherine Helen "Carrie"  female   NaN      1      2        W./C. 6607  23.4500   NaN        S
    889          890         1       1                                Behr, Mr. Karl Howell    male  26.0      0      0            111369  30.0000  C148        C
    890          891         0       3                                  Dooley, Mr. Patrick    male  32.0      0      0            370376   7.7500   NaN        Q
    
    [891 rows x 12 columns]
    
    3.将姓名列中的大写名字全部修改为小写名字
    >>> titanic["Name"].str.lower()
    0                                  braund, mr. owen harris
    1      cumings, mrs. john bradley (florence briggs thayer)
    2                                   heikkinen, miss. laina
    3             futrelle, mrs. jacques heath (lily may peel)
    4                                 allen, mr. william henry
                                  ...                         
    886                                  montvila, rev. juozas
    887                           graham, miss. margaret edith
    888               johnston, miss. catherine helen "carrie"
    889                                  behr, mr. karl howell
    890                                    dooley, mr. patrick
    Name: Name, Length: 891, dtype: object
    
    4.按逗号分隔姓名,将行转化为列表
    >>> titanic["Name"].str.split(",")
    0                                  [Braund,  Mr. Owen Harris]
    1      [Cumings,  Mrs. John Bradley (Florence Briggs Thayer)]
    2                                   [Heikkinen,  Miss. Laina]
    3             [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
    4                                 [Allen,  Mr. William Henry]
                                    ...                          
    886                                  [Montvila,  Rev. Juozas]
    887                           [Graham,  Miss. Margaret Edith]
    888               [Johnston,  Miss. Catherine Helen "Carrie"]
    889                                  [Behr,  Mr. Karl Howell]
    890                                    [Dooley,  Mr. Patrick]
    Name: Name, Length: 891, dtype: object
    
    5.提取名字列表中的第一个元素
    >>> titanic["Surname"] = titanic["Name"].str.split(",").str.get(0)
    >>> titanic["Surname"]
    0         Braund
    1        Cumings
    2      Heikkinen
    3       Futrelle
    4          Allen
             ...    
    886     Montvila
    887       Graham
    888     Johnston
    889         Behr
    890       Dooley
    Name: Surname, Length: 891, dtype: object
    
    6.查找包含指定字符的用户信息
    >>> titanic[titanic["Name"].str.contains("Countess")]
         PassengerId  Survived  Pclass                                                      Name     Sex   Age  SibSp  Parch  Ticket  Fare Cabin Embarked Surname
    759          760         1       1  Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)  female  33.0      0      0  110152  86.5   B77        S  Rothes
    
    7.将男性值替换为M,并将所有女性值替换为F
    >>> titanic["Sex_short"] = titanic["Sex"].replace({"male": "M", "female": "F"})
    >>> titanic
         PassengerId  Survived  Pclass                                                 Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked    Surname Sex_short
    0              1         0       3                              Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S     Braund         M
    1              2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1      0          PC 17599  71.2833   C85        C    Cumings         F
    2              3         1       3                               Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S  Heikkinen         F
    3              4         1       1         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S   Futrelle         F
    4              5         0       3                             Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S      Allen         M
    ..           ...       ...     ...                                                  ...     ...   ...    ...    ...               ...      ...   ...      ...        ...       ...
    886          887         0       2                                Montvila, Rev. Juozas    male  27.0      0      0            211536  13.0000   NaN        S   Montvila         M
    887          888         1       1                         Graham, Miss. Margaret Edith  female  19.0      0      0            112053  30.0000   B42        S     Graham         F
    888          889         0       3             Johnston, Miss. Catherine Helen "Carrie"  female   NaN      1      2        W./C. 6607  23.4500   NaN        S   Johnston         F
    889          890         1       1                                Behr, Mr. Karl Howell    male  26.0      0      0            111369  30.0000  C148        C       Behr         M
    890          891         0       3                                  Dooley, Mr. Patrick    male  32.0      0      0            370376   7.7500   NaN        Q     Dooley         M
    
    [891 rows x 14 columns]
    
    8.总结

    使用str方法可以使用字符串方法,replace方法是根据给定字典转换值的便捷方法

    相关文章

      网友评论

          本文标题:Python之pandas文字数据处理

          本文链接:https://www.haomeiwen.com/subject/aeqwuhtx.html