正则表达式中的“量词”入门介绍

作者: c6a092a61698 | 来源:发表于2017-04-03 06:15 被阅读91次

正则表达式中的“量词”入门介绍
Python基础入门 - 正则表达式与综合实战
iOS开发常用方法（正则匹配）
正则表达式的贪婪与非贪婪的例子浅析
JS利用正则表达式去除字符串空格
关于js正则表达式的贪婪和懒惰模式
正则表达式中的量词
正则表达式(语法)
正则相关
正则表达式符号含义

正则表达式中的量词可以用来指明某个字符串匹配的次数。将在以下描述“贪心量词”（Greedy）、“厌恶量词”（reluctant）、“占有量词”（possessive）这三种量词。（真的不知道怎么翻译）。乍一看量词X？(贪心量词)、X??(厌恶量词) 和X？+(占有量词)好像作用也差不多，因为它们的匹配规则都是匹配“X” 一次或者零次，即X出现一次或者一次都不出现。其实它们有着细微的差别，在本文中最后一部分会说明。

1.png

让我们用贪心量词来创建三种不同的正则表达式：a?、a*、a+、。看看如果用空字符串来测匹配会得到什么结果。

先给出以下测试代码（直接使用终端编译运行即可）：

public class RegexTestHarness {
    public static void main(String[] args){
        Console console = System.console();
        if (console == null) {
            System.err.println("No console.");
            System.exit(1);
        }
        while (true) {

            Pattern pattern =
                    Pattern.compile(console.readLine("%nEnter your regex: "));

            Matcher matcher =
                    pattern.matcher(console.readLine("Enter input string to search: "));

            boolean found = false;
            while (matcher.find()) {
                console.format("I found the text" +
                                " \"%s\" starting at " +
                                "index %d and ending at index %d.%n",
                        matcher.group(),
                        matcher.start(),
                        matcher.end());
                found = true;
            }
            if(!found){
                console.format("No match found.%n");
            }
        }
    }
}

Enter your regex: a?
Enter input string to search:
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a*
Enter input string to search:
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a+
Enter input string to search:
No match found.

零长度匹配

在上面的例子中，前两个例子可以匹配成功是因为表达式a?和a*允许字符串中不出现‘a’字符。你会看到开始和结束的下标都是0。空字符串""没有长度，因此这个正则在开始位置（即下标为0）即匹配成功。像这一类的匹配称之为“零长度匹配”。零长度匹配会在以下三种情况出现：
1.一个空字符串匹配。
2.和字符串的开端匹配，即下标为0的地方匹配。（开端即是空字符串）
3.和字符串结束的位置匹配。（结束即是空字符串）
4.任意两个字符之间,如"bc"，b和c之间即存在一个空字符串""。

用“foo”这个字符串作为例子，下标的位置对应关系为

Paste_Image.png

即index=0和index=3的地方会匹配。

零长度匹配是非常容易辨别出来，因为他们开始的位置和结束的位置是同一下标。

然我们再看几个列子，输入一个“a”字符。
Enter your regex: a?
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a*
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a+
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

以上三个量词都能找到字符“a”，但是前两个例子在下标为1处匹配，也就是字符的结尾处。记住，匹配器查找到下标0和1之间的“a”，该程序会一直匹配到没有匹配为止。

接下来输入"ababaaaab",看下会得到什么输出。输出如下：
Enter your regex: a?
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "a" starting at index 5 and ending at index 6.
I found the text "a" starting at index 6 and ending at index 7.
I found the text "a" starting at index 7 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a*
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a+
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.

读者可以自己推敲为什么会得出以上结果。

如果要限制某个字符出现的次数，可以使用大括号"{}"。如：
匹配“aaa”
Enter your regex: a{3}
Enter input string to search: aa
No match found.

Enter your regex: a{3}
Enter input string to search: aaa
I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}
Enter input string to search: aaaa
I found the text "aaa" starting at index 0 and ending at index 3.

对于第三个实例，要注意的是，当匹配了前三个a，后面的匹配和前面3个a没有任何关系，正则会继续和“aaa”后面的内容继续尝试匹配。

被量词修饰的子表达式 如：
Enter your regex: (dog){3}
Enter input string to search: dogdogdogdogdogdog
I found the text "dogdogdog" starting at index 0 and ending at index 9.
I found the text "dogdogdog" starting at index 9 and ending at index 18.

Enter your regex: dog{3}
Enter input string to search: dogdogdogdogdogdog
No match found.

对于第二个例子，正则表达式匹配的内容应该是"do",后面紧跟3个"g",因此第二个例子无法匹配。

再看多一个例子：
Enter your regex: [abc]{3}
Enter input string to search: abccabaaaccbbbc
I found the text "abc" starting at index 0 and ending at index 3.
I found the text "cab" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.
I found the text "ccb" starting at index 9 and ending at index 12.
I found the text "bbc" starting at index 12 and ending at index 15.

Enter your regex: abc{3}
Enter input string to search: abccabaaaccbbbc
No match found.

贪婪模式和厌恶模式和占有模式的区别
贪婪模式之所以被称为贪婪模式，是因为贪婪模式会尽可能的去匹配更多的内容，如果匹配不成功，将会进行回溯，直至匹配成功或者不成功。
看看下面例子：
Enter your regex: .*foo // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.
第一个例子采用贪婪模式，.部分和整个字符串"xfooxxxxxxfoo"匹配，接着正则中foo部分和字符串"xfooxxxxxxfoo"的剩余部分匹配，即空字串"",发现匹配不成功。开始回溯, .*与"xfooxxxxxxfo"匹配，正则中的foo部分和"xfooxxxxxxfo"剩余部分进行匹配，即"o"，发现不匹配，继续回溯。重复上诉过程，直到匹配成功。由于是贪婪模式，一旦成功，将不会继续匹配，匹配终止。

第二个例子采用的是厌恶模式（非贪婪模式），刚好和贪婪模式相反，一开始只会和字符串开始位置进行匹配，此例中，即和空字符串""匹配，匹配成功后，正则中的foo部分和字符串中的开头三个字符"xfo"匹配，发现匹配不成功。.*?开始和第一个字符匹配，即"x",匹配成功，接着正则中的foo和字符串中的"foo"匹配。至此整个正则第一次匹配成功。接着继续匹配，接下来的匹配内容为"xxxxxxfoo",采用相同的规则继续匹配,第二次匹配成功的字符串为"xxxxxxfoo"。直至整个字符串被消耗完毕才终止匹配。

第三个例子是占有模式。该模式只进行一次匹配。不进行回溯尝试，在次例中，.*+与"xfooxxxxxxfoo"匹配，正则中的foo和空字符串""匹配，匹配失败。将不进行回溯尝试。匹配结束。

以上内容大部分是翻译The Java™ Tutorials中关于正则的教程

正则表达式中的“量词”入门介绍
正则表达式中的量词可以用来指明某个字符串匹配的次数。将在以下描述“贪心量词”（Greedy）、“厌恶量词”（rel...
Python基础入门 - 正则表达式与综合实战
1. 初识正则表达式 1.1 介绍步骤介绍正则表达式入门及应用正则表达式的进阶正则表达式案例 1.2 正则表达式...
iOS开发常用方法（正则匹配）
写在前面正则表达式常见字符正则表达式特殊字符正则表达式数量词正则表达式边界匹配 iOS中的NSRegula...
正则表达式的贪婪与非贪婪的例子浅析
正则表达式种常用的量词X+（1个或多个）、X*（0个或1多个）、X?（0个或1个）正则表达式默认为贪婪模式，量词...
JS利用正则表达式去除字符串空格
以上方法是通过正则表达式来实现的另外介绍一些正则表达的简单模式：一些量词：复杂模式：
关于js正则表达式的贪婪和懒惰模式
1、量词正则表达式的量词有：* + ? {}* : 匹配0次到多次+ : 匹配1次到多次? : 匹配0次或1次{...
正则表达式中的量词
自我感觉量词是正则表达式里最不容易理解的地方，所以特别为它做了个总结。为了容易理解，会简单地结合正则表达式引擎的...
正则表达式(语法)
1.1 正则表达式元字符和语法： 1.2. 数量词的贪婪模式与非贪婪模式正则表达式通常用于在文本中查找匹配的字符...
正则相关
正则表达式基本语法正则表达式常见字符正则表达式特殊字符正则表达式数量词正则表达式边界匹配正则表达式逻辑或...
正则表达式符号含义
关键词：正则表达式正则表达式是一种特殊的字符串模式，用于匹配一组字符串元字符量词关于量词所涉及到的重要的三个...