美文网首页MatplotlibPandas
pandas对字符串的处理

pandas对字符串的处理

作者: 何同尘 | 来源:发表于2019-01-27 20:46 被阅读4次

pandas的字符串操作

在pandas中可以直接使用一些对字符串操作的方法直接操作。

data = ['peter', 'Paul', 'MARY', 'gUIDO']
[s.capitalize() for s in data]
import pandas as pd
names = pd.Series(data)
names
names.str.capitalize()

str 可以将其他对象转化为字符串,可以忽略不可操作的空值。

pandas 列表的字符串方法

len() lower() translate() islower()
ljust() upper() startswith() isupper()
rjust() find() endswith() isnumeric()
center() rfind() isalnum() isdecimal()
zfill() index() isalpha() split()
strip() rindex() isdigit() rsplit()
rstrip() capitalize() isspace() partition()
lstrip() swapcase() istitle() rpartition()

见名即可知道含义。
用法:

monte.str.lower()

正则表达式

Method Description
match() Call re.match() on each element, returning a boolean.
extract() Call re.match() on each element, returning matched groups as strings.
findall() Call re.findall() on each element
replace() Replace occurrences of pattern with some other string
contains() Call re.search() on each element, returning a boolean
count() Count occurrences of pattern
split() Equivalent to str.split(), but accepts regexps
rsplit() Equivalent to str.rsplit(), but accepts regexps

. * ? ^ $ 这些匹配语句

其他方法

Method Description
get() Index each element
slice() Slice each element
slice_replace() Replace slice in each element with passed value
cat() Concatenate strings
repeat() Repeat values
normalize() Return Unicode form of string
pad() Add whitespace to left, right, or both sides of strings
wrap() Split long strings into lines with length less than a given width
join() Join strings in each element of the Series with passed separator
get_dummies() extract dummy variables as a dataframe

从其他途径读取数据

比较多的是json,xml等数据形式。

with open('recipeitems-latest.json') as f:
line = f.readline()
pd.read_json(line).shape

相关文章

网友评论

    本文标题:pandas对字符串的处理

    本文链接:https://www.haomeiwen.com/subject/ndunjqtx.html