str.extract()

作者: 弦好想断 | 来源:发表于2020-04-30 13:24 被阅读0次

str.extract()
Python中的正则表达式
Day36 - 2018-05-10
6、pandas的分列之不规则字符串及str.extract()

先介绍str.extract()，可用正则从字符数据中抽取匹配的数据，只返回第一个匹配的数据。
注意，正则表达式中必须有分组，只是返回分组中的数据，如果给分组取了名称，则该名称就是返回结果中的字段名。
Series.str.extract(pat, flags=0, expand=None)
参数:
pat : 字符串或正则表达式
flags : 整型,
expand : 布尔型,是否返回DataFrame
Returns:
数据框dataframe/索引index

Series.str.extractall(pat, flags=0)
返回所有匹配的字符
参数:
pat : 字符串或正则表达式
flags : 整型
返回值:
其他

Series.str.capitalize() 首字母大写
Series.str.cat([others, sep, na_rep]) 用分隔符连接列表中的字符
Series.str.center(width[, fillchar]) Filling left and right side of strings in the Series/Index with an additional character.
Series.str.contains(pat[, case, flags, na, …]) Return boolean Series/array whether given pattern/regex is contained in each string in the Series/Index.
Series.str.count(pat[, flags]) Count occurrences of pattern in each string of the Series/Index.
Series.str.decode(encoding[, errors]) Decode character string in the Series/Index using indicated encoding.
Series.str.encode(encoding[, errors]) Encode character string in the Series/Index using indicated encoding.
Series.str.endswith(pat[, na]) Return boolean Series indicating whether each string in the Series/Index ends with passed pattern.
Series.str.extract(pat[, flags, expand]) For each subject string in the Series, extract groups from the first match of regular expression pat.
Series.str.extractall(pat[, flags]) For each subject string in the Series, extract groups from all matches of regular expression pat.
Series.str.find(sub[, start, end]) Return lowest indexes in each strings in the Series/Index where the substring is fully contained between [start:end].
Series.str.findall(pat[, flags]) Find all occurrences of pattern or regular expression in the Series/Index.
Series.str.get(i) Extract element from lists, tuples, or strings in each element in the Series/Index.
Series.str.index(sub[, start, end]) Return lowest indexes in each strings where the substring is fully contained between [start:end].
Series.str.join(sep) Join lists contained as elements in the Series/Index with passed delimiter.
Series.str.len() Compute length of each string in the Series/Index.
Series.str.ljust(width[, fillchar]) Filling right side of strings in the Series/Index with an additional character.
Series.str.lower() Convert strings in the Series/Index to lowercase.
Series.str.lstrip([to_strip]) Strip whitespace (including newlines) from each string in the Series/Index from left side.
Series.str.match(pat[, case, flags, na, …]) Determine if each string matches a regular expression.
Series.str.normalize(form) Return the Unicode normal form for the strings in the Series/Index.
Series.str.pad(width[, side, fillchar]) Pad strings in the Series/Index with an additional character to specified side.
Series.str.partition([pat, expand]) Split the string at the first occurrence of sep, and return 3 elements containing the part before the separator, the separator itself, and the part after the separator.
Series.str.repeat(repeats) Duplicate each string in the Series/Index by indicated number of times.
Series.str.replace(pat, repl[, n, case, flags]) Replace occurrences of pattern/regex in the Series/Index with some other string.
Series.str.rfind(sub[, start, end]) Return highest indexes in each strings in the Series/Index where the substring is fully contained between [start:end].
Series.str.rindex(sub[, start, end]) Return highest indexes in each strings where the substring is fully contained between [start:end].
Series.str.rjust(width[, fillchar]) Filling left side of strings in the Series/Index with an additional character.
Series.str.rpartition([pat, expand]) Split the string at the last occurrence of sep, and return 3 elements containing the part before the separator, the separator itself, and the part after the separator.
Series.str.rstrip([to_strip]) Strip whitespace (including newlines) from each string in the Series/Index from right side.
Series.str.slice([start, stop, step]) Slice substrings from each element in the Series/Index
Series.str.slice_replace([start, stop, repl]) Replace a slice of each string in the Series/Index with another string.
Series.str.split([pat, n, expand]) Split each string (a la re.split) in the Series/Index by given pattern, propagating NA values.
Series.str.rsplit([pat, n, expand]) Split each string in the Series/Index by the given delimiter string, starting at the end of the string and working to the front.
Series.str.startswith(pat[, na]) Return boolean Series/array indicating whether each string in the Series/Index starts with passed pattern.
Series.str.strip([to_strip]) Strip whitespace (including newlines) from each string in the Series/Index from left and right sides.
Series.str.swapcase() Convert strings in the Series/Index to be swapcased.
Series.str.title() Convert strings in the Series/Index to titlecase.
Series.str.translate(table[, deletechars]) Map all characters in the string through the given mapping table.
Series.str.upper() Convert strings in the Series/Index to uppercase.
Series.str.wrap(width, **kwargs) Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width.
Series.str.zfill(width) Filling left side of strings in the Series/Index with 0.
Series.str.isalnum() Check whether all characters in each string in the Series/Index are alphanumeric.
Series.str.isalpha() Check whether all characters in each string in the Series/Index are alphabetic.
Series.str.isdigit() Check whether all characters in each string in the Series/Index are digits.
Series.str.isspace() Check whether all characters in each string in the Series/Index are whitespace.
Series.str.islower() Check whether all characters in each string in the Series/Index are lowercase.
Series.str.isupper() Check whether all characters in each string in the Series/Index are uppercase.
Series.str.istitle() Check whether all characters in each string in the Series/Index are titlecase.
Series.str.isnumeric() Check whether all characters in each string in the Series/Index are numeric.
Series.str.isdecimal() Check whether all characters in each string in the Series/Index are decimal.
Series.str.get_dummies([sep]) Split each string in the Series by sep and return a frame of dummy/indicator variables.

原文链接：https://blog.csdn.net/yj1556492839/article/details/79882488

网友评论

本文标题：str.extract()

本文链接：https://www.haomeiwen.com/subject/yzgnwhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

str.extract()

相关文章

str.extract()

Python中的正则表达式

Day36 - 2018-05-10

6、pandas的分列之不规则字符串及str.extract()

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读