Leetcode高级挑战：正则表达式匹配

作者: 我是罗比 | 来源:发表于2018-09-05 10:08 被阅读0次

Leetcode高级挑战：正则表达式匹配
LeetCode #10: Regular Expression
1.回溯问题（一）
LeetCode 10. 正则表达式匹配 | Python
Python 学习笔记 060
03-web服务器v3.1--01
Python 高级 8正则表达式
《剑指 Offer》（第 2 版）第 19 题：正则表达式匹配
字符串
Nginx 匹配规则

《正则表达式匹配》在Leetcode属于难度为Hard的问题，我自己花了半天，使用递归的方式完成了题目，但是程序效率很低。后来看了“标准答案”，花了一些时间研究才完全理解，这篇文章我尝试解释“标准答案”使用的算法。

原题链接 https://leetcode.com/problems/regular-expression-matching/description/

问题

给出一个字符串(s)和一个表达式(p)，实现一个支持'.'和''正则表达式的匹配。

'.' 匹配任意单个字符
'*' 匹配0个或多个前一个字符

表达式必须匹配整个字符串而不是其中一部分

Note：

s 可以为空或非空字符串，字符串中字符的取值范围为 a-z
p 可以为空或非空字符串，字符串中字符的取值范围为 a-z, 还可以包含两个特殊字符"."和"*”

例1：

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

例2：

Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the precedeng element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

例3：

Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

例4：

Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".

例5：

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

“标准答案”算法

构造一个二维数组dp

    boolean[][] dp = new boolean[s.length()+1][p.length()+1];

初始化dp

    dp[0][0] = true;
    for (int i = 0; i < p.length(); i++) {
        if (p.charAt(i) == '*' && dp[0][i-1]) {
            dp[0][i+1] = true;
        }
    }

遍历字符串s和字符串p中的字符

    for (int i = 0 ; i < s.length(); i++) {
        for (int j = 0; j < p.length(); j++) {
            if (p.charAt(j) == '.') {
                // 该字符匹配成功，使用去掉1个字符的s子串和p子串的匹配结果，作为当前子串的匹配结果
                dp[i+1][j+1] = dp[i][j];
            }
            if (p.charAt(j) == s.charAt(i)) {
                // 该字符匹配成功，使用去掉1个字符的s子串和p子串的匹配结果，作为当前子串的匹配结果
                dp[i+1][j+1] = dp[i][j];
            }
            if (p.charAt(j) == '*') {
                // 如果遇到 '*'
                if (p.charAt(j-1) != s.charAt(i) && p.charAt(j-1) != '.') {
                    // 该字符和p中上一个字符匹配失败，使用当前s子串和去掉2个字符的p子串的匹配结果，作为当前子串的匹配结果
                    dp[i+1][j+1] = dp[i+1][j-1];
                } else {
                    // 该字符和p中上一个字符匹配成功，使用下面三个匹配结果作与运算后的结果，作为当前子串的匹配结果
                    // 1. s子串去掉一个字符，和当前p子串的匹配结果 （in this case, a* counts as multiple a ）
                    // 2. s子串和p子串去掉1个字符的匹配结果（in this case, a* counts as single a）
                    // 3. s子串和p子串去掉2个字符的匹配结果（in this case, a* counts as empty）
                    dp[i+1][j+1] = (dp[i+1][j] || dp[i][j+1] || dp[i+1][j-1]);
                }
            }
        }
    }

返回最终结果

    return dp[s.length()][p.length()];

概述

该算法的核心是，通过二维遍历，依次计算所有s子串和p子串的匹配结果，最后的匹配结果，就是整个s串和p串的匹配结果

假设 s="xaabyc" p="xa*b.c"

1，初始化

    dp[0][0] = true;
    for (int i = 0; i < p.length(); i++) {
        if (p.charAt(i) == '*' && dp[0][i-1]) {
            dp[0][i+1] = true;
        }
    }

空串和空串匹配，结果为True，所以 dp[0][0] = true

s\p	0	x	a	*	b	.	c
0	T	F	F	F	F	F	F
x	F
a	F
a	F
b	F
y	F
c	F

2，遍历字符串s和字符串p中的字符

for (int i = 0 ; i < s.length(); i++) {
        for (int j = 0; j < p.length(); j++) {
            if (p.charAt(j) == '.') {
                // 该字符匹配成功，使用去掉1个字符的s子串和p子串的匹配结果，作为当前子串的匹配结果
                dp[i+1][j+1] = dp[i][j];
            }
            if (p.charAt(j) == s.charAt(i)) {
                // 该字符匹配成功，使用去掉1个字符的s子串和p子串的匹配结果，作为当前子串的匹配结果
                dp[i+1][j+1] = dp[i][j];
            }
            if (p.charAt(j) == '*') {
                // 如果遇到 '*'
                if (p.charAt(j-1) != s.charAt(i) && p.charAt(j-1) != '.') {
                    // 该字符和p中上一个字符匹配失败，使用当前s子串和去掉2个字符的p子串的匹配结果，作为当前子串的匹配结果
                    dp[i+1][j+1] = dp[i+1][j-1];
                } else {
                    // 该字符和p中上一个字符匹配成功，使用下面三个匹配结果作与运算后的结果，作为当前子串的匹配结果
                    // 1. s子串去掉一个字符，和当前p子串的匹配结果 （in this case, a* counts as multiple a ）
                    // 2. s子串和p子串去掉1个字符的匹配结果（in this case, a* counts as single a）
                    // 3. s子串和p子串去掉2个字符的匹配结果（in this case, a* counts as empty）
                    dp[i+1][j+1] = (dp[i+1][j] || dp[i][j+1] || dp[i+1][j-1]);
                }
            }
        }
    }