- [刷题防痴呆] 0187 - 重复的DNA序列 (Repeate
- [刷题防痴呆] 0392 - 判断子序列 (Is Subsequ
- [刷题防痴呆] 0521 - 最长特殊序列 Ⅰ(Longest
- [刷题防痴呆] 0376 - 摆动序列 (Wiggle Subs
- 【LeetCode通关全记录】187. 重复的DNA序列
- [刷题防痴呆] 0522 - 最长特殊序列 II (Longes
- [刷题防痴呆] 0674 - 最长连续递增序列 (Longest
- [刷题防痴呆] 0128 - 最长连续序列 (Longest C
- [刷题防痴呆] 0594 - 最长和谐子序列 (Longest
- [刷题防痴呆] 0316 - 去除重复字母 (Remove Du
题目地址
https://leetcode.com/problems/repeated-dna-sequences/description/
题目描述
187. Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
思路
- hash加滑动窗口.
关键点
- 注意, hashset里可以存数, 把DNA当做一个4进制的数. 这样可以节省空间.
代码
- 语言支持:Java
// 直接string hashset
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Set<String> set = new HashSet<>();
Set<String> resSet = new HashSet<>();
int len = 10;
for (int i = 0; i + len - 1 < s.length(); i++) {
String sub = s.substring(i, i + len);
if (set.contains(sub)) {
resSet.add(sub);
} else {
set.add(sub);
}
}
return new ArrayList<>(resSet);
}
}
// 也可以用encode来将dna转换成integer, 本质上是一个四进制数
public List<String> findRepeatedDnaSequences(String s) {
HashSet<Integer> seen = new HashSet<>();
HashSet<String> repeated = new HashSet<>();
int l = 10;
for (int i = 9; i < s.length(); i++) {
String sub = s.substring(i - 9, i + 1);
int encoded = encode(sub);
if (seen.contains(encoded)) {
repeated.add(sub);
} else {
seen.add(encoded);
}
}
return new ArrayList(repeated);
}
private int encode(String s) {
int n = 0;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == 'A') {
n = n * 4;
} else if (s.charAt(i) == 'C') {
n = n * 4 + 1;
} else if (s.charAt(i) == 'G') {
n = n * 4 + 2;
} else {
n = n * 4 + 3;
}
}
return n;
}
}
网友评论