美文网首页
[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate

[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate

作者: 西出玉门东望长安 | 来源:发表于2022-01-26 00:00 被阅读0次

题目地址

https://leetcode.com/problems/repeated-dna-sequences/description/

题目描述

187. Repeated DNA Sequences


All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

思路

  • hash加滑动窗口.

关键点

  • 注意, hashset里可以存数, 把DNA当做一个4进制的数. 这样可以节省空间.

代码

  • 语言支持:Java
// 直接string hashset
class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Set<String> set = new HashSet<>();
        Set<String> resSet = new HashSet<>();
        int len = 10;
        for (int i = 0; i + len - 1 < s.length(); i++) {
            String sub = s.substring(i, i + len);
            if (set.contains(sub)) {
                resSet.add(sub);
            } else {
                set.add(sub);
            }
        }

        return new ArrayList<>(resSet);
    } 
}
    
    // 也可以用encode来将dna转换成integer, 本质上是一个四进制数
    public List<String> findRepeatedDnaSequences(String s) {
        
        HashSet<Integer> seen = new HashSet<>();
        HashSet<String> repeated = new HashSet<>();
        int l = 10;
        for (int i = 9; i < s.length(); i++) {
            String sub = s.substring(i - 9, i + 1);
            int encoded = encode(sub);
            if (seen.contains(encoded)) {
                repeated.add(sub);
            } else {
                seen.add(encoded);
            } 
        }
        
        return new ArrayList(repeated);
    }
    
    private int encode(String s) {
        int n = 0;
        for (int i = 0; i < s.length(); i++) {
            if (s.charAt(i) == 'A') {
                n = n * 4;
            } else if (s.charAt(i) == 'C') {
                n = n * 4 + 1;
            } else if (s.charAt(i) == 'G') {
                n = n * 4 + 2;
            } else {
                n = n * 4 + 3;
            }
        }
        
        return n;
    }
}

相关文章

网友评论

      本文标题:[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate

      本文链接:https://www.haomeiwen.com/subject/nhxbcrtx.html