参考官网:Regular expression operations
re: regular expression, 简写:regex
- 正则表达式规则:版本:v2.3.5 (2017-6-12) 作者:deerchao; http://deerchao.net/tutorials/regex/regex.htm
------------------------------------------------------------------------------------- - 正则表达式的功能:正则表达式(regular expression)主要功能是从字符串(string)中通过特定的模式(pattern),搜索想要找到的内容。
-------------------------------------------------------------------------------------
re
常用函数:
-
re.compile(pattern, flags)
将一个正则表达式的pattern 转化成一个正则表达式对象
Compile a regular expression pattern into a regular expression object, which can be used for matching using itsmatch()
,search()
and other methods, described below.
prog = re.compile(pattern)
result = prog.match(string)
is equivalent to
result = re.match(pattern, string)
-------------------------------------------------------------------------------------
-
re.search(pattern, string, flags = 0)
在 string 中找到 pattern 第一次出现的地方
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. ReturnNone
if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
------------------------------------------------------------------------------------- -
re.match(pattern, string, flags = 0)
在字符串 string 的句首进行匹配 pattern,不能像search()
任意匹配
if zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. ReturnNone
if the string does not match the pattern; note that this is different from a zero-length match.
Note that even inMULTILINE
mode,re.match()
will only match at the beginning of the string and not at the beginning of each line.
If you want to locate a match anywhere in string, usesearch()
instead (see also search() vs. match()).
------------------------------------------------------------------------------------- -
re.split(pattern, string, flags = 0)
以在 string 中匹配到的pattern为界对 string 进行分割,如果 pattern使用了括号,那么找到的pattern也一起返回;
如下所示,'\W+'
匹配1个或者多个任意不是字母、数字、下划线的字符,则匹配到了逗号,
以及后面的空格,因此以逗号和空格为界进行分割;第二个例子加了括号,则将匹配到的逗号和空格也进行返回。
>>> re.split(r'\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split(r'(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split(r'\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
['0', '3', '9']
If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string. The same holds for the end of the string:如果在字符串头部或者是字符串尾部匹配到,则会增加返回一个空字符串
>>> re.split(r'(\W+)', '...words, words...')
['', '...', 'words', ', ', 'words', '...', '']
-------------------------------------------------------------------------------------
-
re.sub(pattern,repl, string, count = 0, flags = 0)
用 repl 去无重叠地覆盖 pattern 在 string中匹配的字符:
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as & are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. For example:
>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
... r'static PyObject*\npy_\1(void)\n{',
... 'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'
这里 def myfunction(): 都被匹配到了,但是([a-zA-Z_][a-zA-Z_0-9]*)
加了括号,所以这里面匹配到的 myfunc 视为群组1,然后用 repl 对匹配好的内容进行无重叠地覆盖,由于 string 全部被匹配,因此全部被覆盖,然后再把群组1 往代码中的\1
处替代。
当 repl 是一个函数时:
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:
>>> def dashrepl(matchobj):
... if matchobj.group(0) == '-': return ' '
... else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
'Baked Beans & Spam'
网友评论