正则表达式中的断言(assertions)
1、什么是断言?
广义上理解,断言(assertions),从字面上理解就是,判定是
,还是否
。在正则表达式的系统里,也就是匹配
或者不匹配
。随便写一个正则表达式,都能产生匹配
或者不匹配
的结果,所以,可以这样说,所有的正则表达式都可以叫断言。
有时候,我们也经常会看到看到这个概念,零宽断言(zero-width assertions)。普通的断言,比如\d+
(匹配一个或者多个数字),它所匹配的内容是由长度的;而有些断言比如^
和$
(分别匹配行开头和结尾)匹配的仅仅是一个位置,这样可以理解为它所匹配的内容长度为0。所以,称这类断言为零宽断言(zero-width assertions)。
然而,实际中,好多时候,提到断言,都是指零宽断言(Regular Expressions Explained)。(可以这样简单地理解:其它的断言比较简单,没有什么好说的。。。)所以,有时候,会看到下面的概念:
An assertion is a test on the characters following or preceding the current matching point that does not actually consume any characters.
From: php Assertions
翻译:
断言就是判断当前位置的前后是否匹配,但是不消耗任何字符。
下面是断言的一个解释:
Actually matches characters, but then gives up the match, returning only the result: match or no match. They do not consume characters in the string, but only assert whether a match is possible or not.
2、断言的分类
正则表达式中右两类断言:Anchors和Lookarounds。
2.1 Anchors
Anchors, or atomic zero-width assertions, cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume characters. The metacharacters listed in the following table are anchors.
Assertion | Description | Pattern | Matches |
---|---|---|---|
^ |
The match must start at the beginning of the string or line. | ^\d{3} |
901 in 901-333-
|
$ |
The match must occur at the end of the string or before \n at the end of the line or string. | -\d{3}$ |
-333 in -901-333
|
\A |
The match must occur at the start of the string. | \A\d{3} |
901 in 901-333-
|
\Z |
The match must occur at the end of the string or before \n at the end of the string. | -\d{3}\Z |
-333 in -901-333
|
\z |
The match must occur at the end of the string. | -\d{3}\z |
-333 in -901-333
|
\G |
The match must occur at the point where the previous match ended. | \G\(\d\) |
(1) , (3) , (5) in (1)(3)(5)[7](9)
|
\b |
The match must occur on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character. | \b\w+\s\w+\b |
them theme , them them in them theme them them
|
\B |
The match must not occur on a \b boundary. | \Bend\w*\b |
ends , ender in end sends endure lender
|
From: Anchors in Regular Expressions
2.2 Lookarounds
Example | Lookaround Name | What it Does |
---|---|---|
(?=foo) |
Lookahead | Asserts that what immediately follows the current position in the string is foo. |
(?<=foo) |
Lookbehind | Asserts that what immediately precedes the current position in the string is foo. |
(?!foo) |
Negative Lookahead | Asserts that what immediately follows the current position in the string is not foo. |
(?<!foo) |
Negative Lookahead | Asserts that what immediately precedes the current position in the string is not foo. |
网友评论