LeetCode 3: Longest Substring Wi

作者: 江米条二号 | 来源:发表于2016-04-24 22:01 被阅读63次

tags: String

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

Function:

public int lengthOfLongestSubstring(String s) {
    // Your Code
}

题目分析

该题目是字符串类型的Medium题目，要求寻找字符串中最长的不重复子串。想寻找字符串中的最长子串，肯定需要遍历整个字符串，那么最优化算法的时间复杂度也必须是O(n)。下面介绍两个算法：算法1是一个朴素想法的实现，算法效率较低；算法2是一个时间复杂度为O(n)的实现，具有很高的算法效率和算法实现的可借鉴性。

算法1：自己的实现

public int lengthOfLongestSubstring(String s) {
    int result = 0;
    Queue<Character> qc = new LinkedList<>();
    int sLength = s.length();
    for (int i=0; i<sLength; i++) {
        char ch = s.charAt(i);
        if (qc.contains(ch)) {
            if (result < qc.size())
                result = qc.size();
            while (qc.contains(ch)) { // 1. 这里不应该用contains方法
                qc.remove();
            }
        }
        qc.add(ch);
    }
    if (qc.size() > result)
        result = qc.size();
    return result;
}

算法1的想法很朴素，建立一个不重复的字符串队列，遍历传入的字符串s，如果下一个字符在当前队列中已存在，则判断队列是否是当前最长子串，然后依次移除队头的字符直到队列不存在重复字符。这样完成一次遍历后，就可以找到最长子串。下面分析一下以上实现的时间效率：LinkedList的contains方法是需要遍历队列（只要不是索引结构，哪种数据结构都需要遍历），每尝试放入一个字符，该方法都需要遍历一次队列，很显然这是一个时间复杂度为O(n2)的实现。另外这份代码中还存在一个性能问题：

无论是LinkedList 或 ArrayList，甚至是除索引结构（如hash、字典树等）以外的数据结构的contains方法肯定需要以某种顺序遍历数据内容，所以能少用尽量少用。

算法2：高效的算法

public int lengthOfLongestSubstring(String s) {
    int lastIndices[] = new int[256]; // int数组模仿hash table
    for(int i = 0; i<256; i++){
        lastIndices[i] = -1;
    }

    char[] chArray = s.toCharArray();
    int strLen = chArray.length;
    int maxLen = 0;
    int curLen = 0;
    int start = 0; // 不重复子串起点字符的index
    for(int i = 0; i<strLen; i++){
        char cur = chArray[i];
        if(lastIndices[cur]  < start){
            lastIndices[cur] = i;
            curLen++;
        }
        else{
            int lastIndex = lastIndices[cur]; // 获取重复字符位置，更新子串起点
            start = lastIndex+1;
            curLen = i-start+1;
            lastIndices[cur] = i;
        }

        if(curLen > maxLen){
            maxLen = curLen;
        }
    }

    return maxLen;
}

算法1的分析中说明了其效率低的原因，是每次加入一个字符时都需要重新遍历队列，确保子串的不重复性。算法2针对这一点，利用hash结构——lastIndices数组——取代了队列的遍历，当遍历新的字符时，首先查看该字符是否在当前子串中出现过，即lastIndices[cur] ? start，如果没出现，则直接更新该字符的lastIndices信息；如果出现过，需要同时更新start和lastIndices信息。并在遍历过程中更新最长子串长度。很显然，算法2的时间复杂度为O(n)，超过Java AC的90%。

总结

该题目AC的难度不大，算法1虽然效率明显不如算法2，但也是AC的。不过算法2有很值得借鉴的地方：

对于需要遍历的地方，应考虑用索引结构（如hash）提高效率。

网友评论

本文标题：LeetCode 3: Longest Substring Wi

本文链接：https://www.haomeiwen.com/subject/mamfrttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！