美文网首页
golang规则表达式使用之重复和选择的优先级问题

golang规则表达式使用之重复和选择的优先级问题

作者: CodingCode | 来源:发表于2021-09-12 02:19 被阅读0次

    重复(Repetition)和选择(Alternation)的优先级差异问题。

    简单说就是:

    1. 重复*, +, {n,m}是单个字符的重复,而
    2. 选择|是字符组合的选择。

    意思是:

    ab+     ==   a(b+)      !=   (ab)+
    ab|cd   ==   (ab)|(cd)  !=   a(b|c)d
    

    所以,看几个例子:

    text := "XXXabbYYYababZZZ"
    fmt.Printf("%q\n", regexp.MustCompile(`ab+`).FindAllString(text, -1))   //["abb" "ab" "ab"]
    fmt.Printf("%q\n", regexp.MustCompile(`a(b+)`).FindAllString(text, -1)) //["abb" "ab" "ab"]
    fmt.Printf("%q\n", regexp.MustCompile(`(ab)+`).FindAllString(text, -1)) //["ab" "abab"]
    
    text := "XXXababYYYacdZZZ"
    fmt.Printf("%q\n", regexp.MustCompile(`ab|cd`).FindAllString(text, -1))     //["ab" "ab" "cd"]
    fmt.Printf("%q\n", regexp.MustCompile(`(ab)|(cd)`).FindAllString(text, -1)) //["ab" "ab" "cd"]
    fmt.Printf("%q\n", regexp.MustCompile(`a(b|c)d`).FindAllString(text, -1))   //["acd"]
    

    这里要明白的是规则表达式操作符的优先级问题。
    参照POSIX对规则表达式操作符优先级的定义:
    Basic Regular Expressions Precedence

    +---+----------------------------------------------------------+
    |   |             ERE Precedence (from high to low)            |
    +---+----------------------------------------------------------+
    | 1 | Collation-related bracket symbols | [==] [::] [..]       |
    | 2 | Escaped characters                | \<special character> |
    | 3 | Bracket expression                | []                   |
    | 4 | Subexpressions/back-references    | \(\) \n              |
    | 5 | Single-character-BRE duplication  | * \{m,n\}            |
    | 6 | Concatenation                     |                      |
    | 7 | Anchoring                         | ^ $                  |
    +---+-----------------------------------+----------------------+
    

    Extended Regular Expressions

    +---+----------------------------------------------------------+
    |   |             ERE Precedence (from high to low)            |
    +---+----------------------------------------------------------+
    | 1 | Collation-related bracket symbols | [==] [::] [..]       |
    | 2 | Escaped characters                | \<special character> |
    | 3 | Bracket expression                | []                   |
    | 4 | Grouping                          | ()                   |
    | 5 | Single-character-ERE duplication  | * + ? {m,n}          |
    | 6 | Concatenation                     |                      |
    | 7 | Anchoring                         | ^ $                  |
    | 8 | Alternation                       | |                    |
    +---+-----------------------------------+----------------------+
    

    这里可以看出选择'|'的优先级是最低的。

    相关文章

      网友评论

          本文标题:golang规则表达式使用之重复和选择的优先级问题

          本文链接:https://www.haomeiwen.com/subject/ifszwltx.html