美文网首页
187. Repeated DNA Sequences &

187. Repeated DNA Sequences &

作者: codingXue | 来源:发表于2016-05-25 20:49 被阅读26次

    问题描述

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
    For example,
    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].

    思路分析

    蠢蠢的我以为要枚举长度为10的子串并用KMP算法进行匹配,因此去复习了一波KMP,然后就超时啦(手动再见)。
    正确的做法是,由于字母只有ATCG四个,可以根据字母对一个字符串进行编码,也可以理解为将ATCG对应为0123,即将字符串转换为4进制数,然后对数字进行匹配就好了。其中还需要注意,不用每次都重算字符串的encode,因为前一个字符串的值后一个可以利用,而因为10个字母正好对应20位二进制数,因此可以通过“&0xFFFFF”来去掉10个字母之前的值,最后是注意用dict提高速度。

    AC代码

    class Solution(object):
        def findRepeatedDnaSequences(self, s):
            """
            :type s: str
            :rtype: List[str]
            """
            n = len(s)
            rst = set()
            once = set()
            code = 0
            code_map = {'A':0, 'C':1, 'G':2, 'T':3}
            for i in range(n):
                code = code * 4 + code_map[s[i]] & 0xFFFFF
                if i < 9:
                    continue
                if code in once:
                    rst.add(s[i-9 : i+1])
                else:
                    once.add(code)
            return list(rst)
    

    KMP复习

    def getNext(p):
        n = len(p)
        next = [0 for i in range(n)]
        k = 0
        for i in range(1, n):
            while k > 0 and p[k] != p[i]:
                k = next[k-1]
            if p[k] == p[i]:
                k += 1
            next[i] = k
        return next
    
    def kmp(s, p):
        i = 0
        j = 0
        n = len(s)
        m = len(p)
        next = getNext(p)
        while j < m and i < n:
            if s[i] == p[j]:
                i += 1
                j += 1
            else:
                if j == 0:
                    i += 1
                else:
                    j = next[j-1]
        if j == m:
            return True
        else:
            return False
    

    相关文章

      网友评论

          本文标题:187. Repeated DNA Sequences &

          本文链接:https://www.haomeiwen.com/subject/fivtdttx.html