Regular Expression

Regular Expression

作者: MrDecoder | 来源:发表于2023-10-30 20:06 被阅读0次


#1. 什么是正则表达式

正则表达式(Regular Expression,简称regex)是一些用来匹配和处理文本的字符串。正则表达式是用正则表达式语言创建的。与其他程序设计语言一样,正则表达式语言也有其特殊的语法和指令。



#2. 匹配单个字符

2.1 匹配纯文本


#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "Hello, my name is Ben. Please visit my websit at http://wwww.forta.com/.";
    regex expression("Ben");
    smatch matches;
    if (regex_search(text, matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
    return 0;

// Results:
// matches for 'Hello, my name is Ben. Please visit my websit at http://wwww.forta.com/.'
// Prefix: 'Hello, my name is '
// 0: Ben
// Suffix: '. Please visit my websit at http://wwww.forta.com/.'


2.2 匹配任意字符


2.3 匹配特殊字符


#3. 匹配一组字符



3.1 匹配多个字符中的某一个


#include <Windows.h>
#include <boost/regex.hpp>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "sales1.xls orders3.xls sales2.xls na1.xls sa1.xls";
    regex expression("[ns]a.\.xls");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;

    return 0;

// matches for 'sales1.xls orders3.xls sales2.xls na1.xls sa1.xls'
// Prefix: 'sales1.xls orders3.xls sales2.xls '
// 0: na1.xls
// Suffix: ' sa1.xls'

// matches for 'sales1.xls orders3.xls sales2.xls na1.xls sa1.xls'
// Prefix: ' '
// 0: sa1.xls
// Suffix: ''


3.2 利用字符集合区间


#include <Windows.h>
#include <boost/regex.hpp>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "na1.xls sa1.xls";
    regex expression("[ns]a[0-9]\.xls");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;

    return 0;

// matches for 'na1.xls sa1.xls'
// Prefix: ''
// 0: na1.xls
// Suffix: ' sa1.xls'

// matches for 'na1.xls sa1.xls'
// Prefix: ' '
// 0: sa1.xls
// Suffix: ''



  • A-Z,匹配所有A到Z的所有大写字母。
  • a-z,匹配所有a到z的所有小写字母。
  • A-F,匹配所有A到F的所有大写字母。
  • A-z,匹配从ASCII字符A到ASCII字符z的所有字母。这个模式一般不常用,因为它还包含着[^等在ASCII字符表里排列在Za之间的字符。

3.3 取非匹配


#4. 使用元字符

4.1 对特殊字符进行转义



4.2 匹配空白字符


元字符 说明
[\b] 回退(并删除)一个字符(Backspace键)
\f 换页符
\n 换行符
\r 回车符
\t 制表符
\v 垂直制表符

4.3 匹配特定的字符类别


1. 匹配数字(与非数字)


元字符 说明
\d 任何一个数字字符(等价于[0-9])
\D 任何一个非数字字符(等价于[^0-9])
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "myArray0";
    regex expression("myArray[\\d]");
    if (regex_match(text, expression))
        cout << "Digital Matched." << endl;
    return 0;
2. 匹配字母和数字(与非字母和数字)


元字符 说明
\w 任何一个字母数字字符(大小写均可)或下划线字符(等价于[a-zA-Z0-9_])
\W 任何一个非字母数字或非下划线字符(等价于[^a-zA-Z0-9_])
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "Hello World";
    regex expression("([\\w]+)[ ]([\\w]+)");
    if (regex_match(text, expression))
        cout << "Word Matched." << endl;
3. 匹配空白字符(与非空白字符)


元字符 说明
\s 任何一个空白字符(等价于[\f\n\r\t\v])
\S 任何一个非空白字符(等价于[^\f\n\r\t\v])

#5. 重复匹配

5.1 有多少个匹配

1. 匹配一个或多个字符


#include <regex>
#include <iostream>
#include <string>

int main() 
    using namespace std;

    string text = "henry.hu@nextlabs.com";
    regex expression("[\\w.]+@[\\w]+\.[\\w]+");
    if (regex_match(text, expression))
        cout << "Email Matched." << endl;
    return 0;


2. 匹配零个或多个字符


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main() 
    using namespace std;

    string text = "Hello .henry.hu@nextlabs.com is my email address.";
    regex expression("\\w+[\\w.]*@[\\w]+\.\\w+");

    smatch matches;
    if (regex_search(text, matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
    return 0;
// matches for 'Hello .henry.hu@nextlabs.com is my email address.'
// Prefix: 'Hello .'
// 0: henry.hu@nextlabs.com
// Suffix: ' is my email address.'
3. 匹配零个或一个字符


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead.";
    regex expression("https?://[\\w./]+");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;
    return 0;

// matches for 'The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead.'
// Prefix: 'The URL is '
// 0: http://www.forta.com/
// Suffix: ', to connect securely use https://www.forta.com/ instead.'

// matches for 'The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead.'
// Prefix: ', to connect securely use '
// 0: https://www.forta.com/
// Suffix: ' instead.'

5.2 匹配的重复次数


  • +*匹配的字符个数没有上限。我们无法为它们将匹配的字符个数设定一个最大值。
  • +*至少匹配零个或一个字符。我们无法为它们将匹配的字符个数另行设定一个最小值。
  • 如果只使用+*,我们无法把它们将匹配的字符个数设定为一个精确的数字。


1. 为重复匹配次数设定一个精确的值


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "#336633 TEXT=#FFFFFF";
    regex expression("#[a-fA-F0-9]{6}");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;
    return 0;

// matches for '#336633 TEXT=#FFFFFF'
// Prefix: ''
// 0: #336633
// Suffix: ' TEXT=#FFFFFF'

// matches for '#336633 TEXT=#FFFFFF'
// Prefix: ' TEXT='
// 0: #FFFFFF
// Suffix: ''
2. 为重复匹配设定一个区间


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "10/30/2023 10-30-2023 2/2/2 01-01-01";
    regex expression("\\d{1,2}[-\/]\\d{1,2}[-\/]\\d{2,4}");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;
    return 0;

// matches for '10/30/2023 10-30-2023 2/2/2 01-01-01'
// Prefix: ''
// 0: 10/30/2023
// Suffix: ' 10-30-2023 2/2/2 01-01-01'

// matches for '10/30/2023 10-30-2023 2/2/2 01-01-01'
// Prefix: ' '
// 0: 10-30-2023
// Suffix: ' 2/2/2 01-01-01'

// matches for '10/30/2023 10-30-2023 2/2/2 01-01-01'
// Prefix: ' 2/2/2 '
// 0: 01-01-01
// Suffix: ''


3. 匹配“至少重复多少次”

{}语法的最后一种用法是给出一个最小的重复次数(但不必给出一个最大值)。{}的这种用法与我们用来重复匹配次数设定一个区间的{}语法很相似,只是省略了最大值部分而已。比如说,{3, }表示至少重复3次。

#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "1001: $896.80, 1002: $1290.69, 1003: $26.43";
    regex expression("\\d+: [$]\\d{3,}\.\\d{2}");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;

    return 0;

// matches for '1001: $896.80, 1002: $1290.69, 1003: $26.43'
// Prefix: ''
// 0: 1001: $896.80
// Suffix: ', 1002: $1290.69, 1003: $26.43'

// matches for '1001: $896.80, 1002: $1290.69, 1003: $26.43'
// Prefix: ', '
// 0: 1002: $1290.69
// Suffix: ', 1003: $26.43'


5.3 防止过度匹配


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;
    string text = "This offer is not available to customers living in <B>AK</B> and <B>HI</B>.";
    regex expression("<[Bb]>.*</[Bb]>");
    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;

    return 0;

// matches for 'This offer is not available to customers living in <B>AK</B> and <B>HI</B>.'
// Prefix: 'This offer is not available to customers living in '
// 0: <B>AK</B> and <B>HI</B>
// Suffix: '.'

<[Bb]>匹配<B>标签(大小写均可),</[Bb]>匹配</B>标签(大小写均可)。这个模式只找到了一个匹配而不是预期中的两个:实际匹配的结果<B>AK</B> and <B>HI</B。出现这种匹配结果的原因是*和+都是所谓的“贪婪型”元字符,它们在进行匹配时的行为模式是多多益善而不是适可而止。它们会尽可能地从一段文本的开头一直匹配到这段文本的末尾,而不是从这段文本的开头匹配到碰到第一个匹配时为止。


贪婪型元字符 懒惰型元字符
* *?
+ +?
{n, } {n, }?


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;
    string text = "This offer is not available to customers living in <B>AK</B> and <B>HI</B>.";
    regex expression("<[Bb]>.*?</[Bb]>");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;

    return 0;

// matches for 'This offer is not available to customers living in <B>AK</B> and <B>HI</B>.'
// Prefix: 'This offer is not available to customers living in '
// 0: <B>AK</B>
// Suffix: ' and <B>HI</B>.'

// matches for 'This offer is not available to customers living in <B>AK</B> and <B>HI</B>.'
// Prefix: ' and '
// 0: <B>HI</B>
// Suffix: '.'

#6. 位置匹配


6.1 边界


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "This cat scattered his food all over the room.";
    regex expression("cat");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;
    return 0;


6.2 单词边界


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "This cat scattered his food all over the room.";
    regex expression("\\bcat\\b");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;

    return 0;

// matches for 'This cat scattered his food all over the room.'
// Prefix: 'This '
// 0: cat
// Suffix: ' scattered his food all over the room.'


6.3 字符串边界


#7. 子表达式

7.1 什么是子表达式


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "Pinging hog.forta.com[]";
    regex expression("(\\d{1,3}\.){3}\\d{1,3}");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    if (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";

    return 0;

// matches for 'Pinging hog.forta.com[]'
// Prefix: 'Pinging hog.forta.com['
// 0:
// 1: 46.
// Suffix: ']'


#8. 回溯引用:前后一致匹配


8.1 回溯引用匹配


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "This is a block of of text, several words here are are repeated, and and they should not be.";
    regex expression("[ ]+(\\w+)[ ]+\\1");

    smatch matches;
    string::const_iterator searchStart(text.cbegin());
    while (regex_search(searchStart, text.cend(), matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
        searchStart = matches.suffix().first;
    return 0;

// matches for 'This is a block of of text, several words here are are repeated, and and they should not be.'
// Prefix: 'This is a block'
// 0:  of of
// 1: of
// Suffix: ' text, several words here are are repeated, and and they should not be.'

// matches for 'This is a block of of text, several words here are are repeated, and and they should not be.'
// Prefix: ' text, several words here'
// 0:  are are
// 1: are
// Suffix: ' repeated, and and they should not be.'

// matches for 'This is a block of of text, several words here are are repeated, and and they should not be.'
// Prefix: ' repeated,'
// 0:  and and
// 1: and
// Suffix: ' they should not be.'

这个模式找到了我们想要的东西,[ ]+匹配一个或多个空格,\w+匹配一个或多个字母数字字符,[ ]+匹配随后的空格。主要,\w+是括在括号里的,它是一个子表达式。这个子表达式不是用来进行重复匹配的。这个子表达式只是把整个模式的一部分单独划出来以便在后面引用。这个模式的最后一部分是\1;这是一个回溯引用,而它引用的正是前面划分出来的那个子表达式:当(\w+)匹配到单词of的时候,\1也匹配单词of;当(\w+)匹配到单词and的时候,\1也匹配单词and。\1代表着模式里的第一个子表达式,\2代表着第2个子表达式、\3代表着第3个;以此类推。

#9. 前后查找

之前介绍的正则表达式都是用来匹配文本的,但有时我们还需要用正则表达式标记要匹配的文本的位置(而不仅仅是文本本身)。这就引出了前后查找(lookaround, 对某一位置的前、后内容进行查找)的概念。

9.1 向前查找


#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;
    string text = "http://www.forta.com/";
    regex expression(".+(?=:)");

    smatch matches;
    if (regex_search(text, matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";

// matches for 'http://www.forta.com/'
// Prefix: ''
// 0: http
// Suffix: '://www.forta.com/'


9.2 向后查找



#include <Windows.h>
#include <regex>
#include <iostream>
#include <string>

int main()
    using namespace std;

    string text = "ABC01: $23.45";
    regex expression("[\$][0-9.]+");

    smatch matches;
    if (regex_search(text, matches, expression))
        cout << "matches for '" << text << "'\n";
        cout << "Prefix: '" << matches.prefix() << "'\n";
        for (size_t i = 0; i < matches.size(); ++i)
            cout << i << ": " << matches[i] << '\n';
        cout << "Suffix: '" << matches.suffix() << "\'\n\n";
    return 0;

9.3 对前后查找取非

向前查找和向后查找通常用来匹配文本,其目的是为了确定将被返回为匹配结果的文本的位置(通过指定匹配结果的前后必须是哪些文本)。这种用法被称为正向前查找(positive lookahead)和正向后查找(positive lookbehind)。

前后查找还有一种不太常见的用法叫作负前后查找(negative lookaround)。负向前查找(negative lookahead)将向前查找不与给定模式相匹配的文本,负向后查找(negative lookbehind)将向后查找不与给定模式相匹配的文本。

操作符 说明
(?=) 正向前查找
(?!) 负向前查找
(?<=) 正向后查找
(?<!) 负向后查找

#10 嵌入条件


  • ?匹配前一个字符或表达式,如果它存在的话。
  • ?=和?<=匹配前面或后面的文本,如果它存在的话。


  • 根据一个回溯引用来进行条件处理。
  • 根据一个前后查找来进行条件处理。


  • 正则表达式 re包 2018-10-02

    参考官网:Regular expression operations re: regular expression...

  • 10. Regular Expression Matching

    最怕regular Expression的题了。出现regular expression立刻想到几点:notes:...

  • AX 使用.Net的Regular Expression

    AX本身不支持regular expression,但可以使用.Net 的Regular Expression。 ...

  • Regular Expression

    校验数字的表达式 数字:^[0-9]*$ n位的数字:^\d{n}$ 至少n位的数字:^\d{n,}$ m-n位的...

  • Regular Expression

    I spend 3 hours to learn about regular expression. The le...

  • Regular Expression

    \d,\w,\s,[a-zA-Z0-9],\b,.,*,+,?,x{3},^,$分别是什么?\d:匹配数字\w :...

  • Regular expression

    定义 来自维基百科正则表达式。在理论计算机科学和形式语言理论中,正则表达式是定义搜索模式的一串字符。这种模式通常用...

  • Regular Expression

    1 正则常规用法 ①几个常用方法 正则调用: test()<用于检测字符是否匹配某个模式,有则返回true,否则返...

  • Regular Expression

    一、在 iOS 中使用正则表达式 二、Online Tool LINK 三、匹配字符 Special Symbol...

  • Regular Expression

    The regular expression written by me in github. Regular e...


      本文标题:Regular Expression
