Q792 Number of Matching Subseque

作者: 牛奶芝麻 | 来源:发表于2018-09-26 14:38 被阅读35次

Given string S and a dictionary of words words, find the number of words[i] that is a subsequence of S.

Example :

Input: 
S = "abcde"
words = ["a", "bb", "acd", "ace"]
Output: 3

Explanation: There are three words in words that are a subsequence of S: 
"a", "acd", "ace".

Note:

All words in words and S will only consists of lowercase letters.
The length of S will be in the range of [1, 50000].
The length of words will be in the range of [1, 5000].
The length of words[i] will be in the range of [1, 50].

解题思路：

思路1：将 S 与 words 中的 word 一个个匹配，计数。时间复杂度很高，超时。见Python实现部分。

思路2：把 S 中各个字符的下标位置用字典列表保存起来（之所以用字典列表，是因为 S 中可能有重复的字母）。见Python实现2部分。

例如，S = "aabcbbac"，那么得到的字典列表为：
{ "a" : [0,1,6], "b" : [2,4,5], "c" : [3,7] }。

然后对于 words 中的每一个 word，计算 word 中每个字符与字典列表中每个字符的位置关系。这里有很多个小注意点：

举例1：word = "acac"， i = 0 时，取 "a" ，0 >= i，因此 "a" 匹配；i =1 时，取 "c"，3 >= i，因此 "c" 匹配；但是当 i = 2 时，取 "a"，发现字典列表中 "a" 的前两个位置 0、1 都不行，但是第三个位置 6 可以，它满足 6 >= i ，所以也是匹配。因此在判断某个字符匹配时，这里应该是一个for循环，目的是找到能匹配的正确的位置。同理，c 的位置 7 寻找也是一样。

举例2： word = "acca"，前 3 个字符都能匹配，这个不用多说。但是当 i = 3 时，取 "a"，我们如果采取例子 1 中的方法，会发现 6 >= i，是匹配的。但是我们观察字符串，发现 "acca" 并不是 S 的子序列。因此，举例 1中还缺少限制条件。通过分析可以发现，因为在找上一个字符 "c" 时，S 的位置已经到达了 7 的位置，所以我们在找字典列表时，找的位置要大于 7，因为 6 < 7，所以不可以。因此，我们用一个变量保存上一次找到的 S 的位置，在 if 判断中加入一个限制，即当前在字典列表中寻找的字符下标要大于上一次寻找到的字符下标。如果遍历完字典列表中的 "a" 的所有位置还是不满足，那么说明不是子序列，返回 0。

Python实现：

# 超时
class Solution:
    def numMatchingSubseq(self, S, words):
        """
        :type S: str
        :type words: List[str]
        :rtype: int
        """
        count = 0
        for word in words:
            count += self.findSubSeq(word, S)
        return count

    def findSubSeq(self, word, S): 
        i = j = 0
        slen, wlen = len(S), len(word)
        while i < slen:
            # 两字符串索引都不能越界 + S 剩余长度要比 word 剩余长度长（减少运行时间） + 当前字符能匹配
            while i < slen and j < wlen and slen - i >= wlen - j and S[i] == word[j]:
                j += 1
                i += 1
            i += 1
        return 1 if j == wlen else 0

Python实现2：

# 不超时
from collections import defaultdict
class Solution2:
    def numMatchingSubseq(self, S, words):
        """
        :type S: str
        :type words: List[str]
        :rtype: int
        """
        def findSubSeq(word): 
            last_i = -1
            for i, v in enumerate(word):
                if not dic_s[v]:  # 如果word中出现S中不存在的字符
                    return 0
                for cur_i in dic_s[v]:
                    if last_i < cur_i and cur_i >= i:  
                        last_i = cur_i 
                        break
                    if cur_i == dic_s[v][-1]:
                        return 0
            return 1

        dic_s = defaultdict(list)
        # 字典列表记录每个字母出现的位置
        for i, v in enumerate(S):
            dic_s[v].append(i)
        count = 0
        for word in words:
            count += findSubSeq(word)
        return count

S = "aabcbbac"
words = ["acac", "acca", "acc"]
print(Solution().numMatchingSubseq(S, words))  # 2
print(Solution2().numMatchingSubseq(S, words))  # 2

补充知识：

1、字典列表的初始化方法：
可以采用 collections 中的 defaultdict(list) 进行初始化。

from collections import defaultdict
dic_s = defaultdict(list)
        # 字典列表记录每个字母出现的位置
        for i, v in enumerate(S):
            dic_s[v].append(i)

2、复制字典的浅拷贝与深拷贝：

如果 A 是一个字典，B = A 是传引用，A、B 会同时改变。
如果 A 是一个字典，B = A.copy() 对于一层来说（比如字典中存储的是字符串）是深拷贝，对于超过一层来说（比如字典中存储的是列表）是浅拷贝。深拷贝修改 A、B 的值相互不影响，但是浅拷贝是相互影响的，即外层不会同时改变，内层（如列表中的元素）会同时改变。所以如果想要复制字符串字典采用 copy() 可行，但复制字典列表不可行。
复制字典不受层数约束的深拷贝方法需要用到 copy 模块中的 deepcopy() 函数。此时，无论字典有几层，A、B都是相互独立的，不会因其中一方的改变而改变。用法如下：

from copy import deepcopy
B = deepcopy(A)

网友评论

本文标题：Q792 Number of Matching Subseque

本文链接：https://www.haomeiwen.com/subject/wkidoftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！