美文网首页
[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate

[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate

作者: 西出玉门东望长安 | 来源:发表于2022-01-26 00:00 被阅读0次

    题目地址

    https://leetcode.com/problems/repeated-dna-sequences/description/

    题目描述

    187. Repeated DNA Sequences
    
    
    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
    
    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
    
    Example:
    
    Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
    
    Output: ["AAAAACCCCC", "CCCCCAAAAA"]
    
    

    思路

    • hash加滑动窗口.

    关键点

    • 注意, hashset里可以存数, 把DNA当做一个4进制的数. 这样可以节省空间.

    代码

    • 语言支持:Java
    // 直接string hashset
    class Solution {
        public List<String> findRepeatedDnaSequences(String s) {
            Set<String> set = new HashSet<>();
            Set<String> resSet = new HashSet<>();
            int len = 10;
            for (int i = 0; i + len - 1 < s.length(); i++) {
                String sub = s.substring(i, i + len);
                if (set.contains(sub)) {
                    resSet.add(sub);
                } else {
                    set.add(sub);
                }
            }
    
            return new ArrayList<>(resSet);
        } 
    }
        
        // 也可以用encode来将dna转换成integer, 本质上是一个四进制数
        public List<String> findRepeatedDnaSequences(String s) {
            
            HashSet<Integer> seen = new HashSet<>();
            HashSet<String> repeated = new HashSet<>();
            int l = 10;
            for (int i = 9; i < s.length(); i++) {
                String sub = s.substring(i - 9, i + 1);
                int encoded = encode(sub);
                if (seen.contains(encoded)) {
                    repeated.add(sub);
                } else {
                    seen.add(encoded);
                } 
            }
            
            return new ArrayList(repeated);
        }
        
        private int encode(String s) {
            int n = 0;
            for (int i = 0; i < s.length(); i++) {
                if (s.charAt(i) == 'A') {
                    n = n * 4;
                } else if (s.charAt(i) == 'C') {
                    n = n * 4 + 1;
                } else if (s.charAt(i) == 'G') {
                    n = n * 4 + 2;
                } else {
                    n = n * 4 + 3;
                }
            }
            
            return n;
        }
    }
    

    相关文章

      网友评论

          本文标题:[刷题防痴呆] 0187 - 重复的DNA序列 (Repeate

          本文链接:https://www.haomeiwen.com/subject/nhxbcrtx.html