[Leetcode 10] Regular Expression Matching

原题说明

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

‘.’ Matches any single character.
‘*’ Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

  • s could be empty and contains only lowercase letters a-z.
  • p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:
s = “aa”
p = “a”
Output: false
Explanation: “a” does not match the entire string “aa”.

Example 2:

Input:
s = “aa”
p = “a*“
Output: true
Explanation: ‘*‘ means zero or more of the precedeng element, ‘a’. Therefore, by repeating ‘a’ once, it becomes “aa”.

Example 3:

Input:
s = “ab”
p = “.*“
Output: true
Explanation: “.*“ means “zero or more (*) of any character (.)”.

Example 4:

Input:
s = “aab”
p = “c*a*b”
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches “aab”.

Example 5:

Input:
s = “mississippi”
p = “mis*is*p*.”
Output: false

解题思路

以下解题思路以python版本为准, cpp版本则是简化的(无递归)动态规划解答, 读者任选其一即可

如果熟悉Edit distance的话, 可以比较容易得想到用动态规划的办法求解.
类似Edit distance的解法, 我们可以构建一个SxP的矩阵来记录状态.
该矩阵中位于坐标i, j的值代表字符串s[i:]和Patternp[j:]是否匹配(若为None, 则代表未知).
求解该矩阵的过程可以看作遵循一定走法的同时,试图寻找一条从(0, 0)走到(S + 1, P + 1)的路径. (S + 1P + 1可以看作是s和p的终结状态)

如果s[i]p[j]是匹配的(s[i] == p[j] 或者 p[j] == '.'):

  • 如果j + 1*的话,我们可以从(i, j)走到(i, j + 2)代表我们跳过这个pattern, 或者从(i, j)走到(i + 1, j)代表我们选择匹配这个字符
  • 如果不是*的话,那么我们直接从(i, j)走到(i + 1, j + 1). 这意味着我们匹配了(i, j)

如果不匹配:

  • 如果j + 1*的话, 我们可以从(i, j)走到(i, j + 2)代表我们跳过这个pattern
  • 如果不是, 那么说明必然不匹配, (i, j)的状态是False
    终结状态就是sp都用完, 也就是走到(S + 1, P + 1)的时候.
    如果p用完了, 但是s还有剩余, 那么显然不匹配.
    如果s用完了, p还有剩余, 那么只有当接下来都是有*的pattern的时候才匹配.

示例代码 (python)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class Solution(object):
def dfs(self, i, j, s, p, dp):
if j == len(p):
return i == len(s)

if i == len(s):
if j < len(p) - 1 and p[j + 1] == '*':
return self.dfs(i, j + 2, s, p, dp)
else:
return False

if dp[i][j] is not None:
return dp[i][j]

curr_match = p[j] == '.' or s[i] == p[j]
if j + 1 < len(p) and p[j + 1] == '*':
dp[i][j] = self.dfs(i, j + 2, s, p, dp) \
or curr_match and self.dfs(i + 1, j, s, p, dp)
else:
dp[i][j] = curr_match and self.dfs(i + 1, j + 1, s, p, dp)
return dp[i][j]
def isMatch(self, s, p):
"""
:type s: str
:type p: str
:rtype: bool
"""
dp = [[None for i in range(len(p))] for j in range(len(s))]
return self.dfs(0, 0, s, p, dp)

示例代码 (cpp)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Solution {
public:
bool isMatch(string s, string p) {
int m = s.size(), n = p.size();
vector<vector<bool>> dp(m + 1, vector<bool>(n + 1, false));
dp[0][0] = true;
for (int i = 0; i <= m; ++i) {
for (int j = 1; j <= n; ++j) {
if (p[j - 1] != '*' && p[j - 1] != '.') {
if (i > 0 && s[i - 1] == p[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
}
} else if (p[j - 1] == '.') {
if (i > 0) {
dp[i][j] = dp[i - 1][j - 1];
}
} else {
if (j == 1) {
continue;
}
dp[i][j] = dp[i][j - 2];
if (i > 0 && (p[j - 2] == '.' || p[j - 2] == s[i - 1])) {
dp[i][j] = dp[i][j] || dp[i - 1][j];
}
}
}
}
return dp[m][n];
}
};

复杂度分析

时间复杂度: O(SP), 其中Ss的长度, Pp的长度.
空间复杂度: O(SP), 其中Ss的长度, Pp的长度

归纳总结

这道题的思路还是比较容易想到用动态规划/递归来做的. 虽然这里python版本使用了DFS,但是因为记录了中间状态,本质上就是动态规划(如果读者细心比较,会发现时间空间复杂度也是一样的). 面试时, 还需要额外注意终结状态的判断和边界条件, 避免出现edge case或者访问了超出边界的矩阵坐标.

------ 关注公众号:猩猩的乐园 ------