3 longest substring without repe

作者: yangminz | 来源:发表于2018-07-16 00:03 被阅读17次

「每日一道算法题」Longest Substring Witho
leetcode第3题最长无重复子字符串
3、Longest Substring Without Repe
3、Longest SubString Without Repe
3 longest substring without repe
3、Longest Substring Without Repe
3 longest substring without repe
3 Longest Substring Without Repe
3.Longest Substring Without Repe
3.Longest Substring Without Repe

title: Longest Substring Without Repeating Characters
tags:
- longest-substring-without-repeating-characters
- No.3
- medium
- string
- divide-conquer
- dynamic-programming
- queue

Problem

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

Corner Cases

""
"a"
"ab"
"xxxcabcxy"

Solution

Divide and Conquer

Divide

For string s[i : j], it can be divided into 2 independent substrings: s[i : k], s[k+1 : j]. If the longest substring without repeating characters for s[i : j] is LSWRC(s[i : j]), then we have:

LSWRC(s[i : j]) = max{ LSWRC(s[i : k]), LSWRC(s[k+1 : j]), s[x : y] }

where s[x : y] is the longest substring without repeating characters accrossing over k, k+1.

In this case, we can divide s[i : j] into 2 halves, which means k = (i + j) / 2

Conquer

Let say LSWRC(String s) gives the longest substring without repeating characters for input s. Then we directly get the results of the following for input s[i : j]:

LSWRC(s[i : k])
LSWRC(s[k+1 : j])

The main problem is s[x : y].

Starting from k and k+1, we move x and y one by one at one time untill:

x decreases to i and y increases to j
s[x - 1] or s[y + 1] is already in s[x : y]

Then we get the required s[x : y].

Finally, compare LSWRC(s[i : k]), LSWRC(s[k+1 : j]) and s[x : y].

An instance follows:

  LSWRC("abcabcbb") 
= max{ LSWRC("abca"), LSWRC("bcbb"), "cab"          }
= max{ max{ LSWRC("ab"), LSWRC("ca"), "abc" },
       max{ LSWRC("bc"), LSWRC("bb"), "cb"  },
       "cab"                                        }
= max{ max{ max{ LSWRC("a"), LSWRC("b"), "ab" },
            max{ LSWRC("c"), LSWRC("a"), "ca" },
            "abc"                                },
       max{ max{ LSWRC("b"), LSWRC("c"), "bc" },
            max{ LSWRC("b"), LSWRC("b"), "b"  },
            "cb"                                 },
       "cab"                                        }
= max{ max{ max{ "a", "b", "ab" },
            max{ "c", "a", "ca" },
            "abc"                  },
       max{ max{ "b", "c", "bc" },
            max{ "b", "b", "b"  },
            "cb"                   },
       "cab"                          }
= max{ max{ "ab", "ca", "abc" },
       max{ "bc", "b" , "cb"  },
       "cab"                     }
= max{ "abc", "cb", "cab" }
= "abc"

$T(n) = 2 \times T(\frac{n}{2}) + O(n)$ , thus running time is $O(n lg(n))$ .

However, there is a severe bug in this algorithm in the part of computing s[x : y]:

When x and y are expanded, they cannot be updated in the same time. For an example: LSWRC("xxxcabcxy") = max{ LSWRC("xxxca"), LSWRC("bcxy"), "abcxy" }.

However, in the process of getting "abcxy", if we add s[x] first:

0. "ab"
1. "cab"
2. "cabc" // stop increasing y
3. "xcab"
4. "xxcab"
5. return "xcab"

s[x : y] is locked by the right c. If we add s[y] first:

0. "ab"
1. "abc"
2. "cabc" // stop decreasing x
3. "abcx"
4. "abcxy"
5. return "abcxy"

It's rather difficult to fix this bug since index x and y are symmetrical in theory while non-symmetrical in code:

class Solution {
    private String[] s;

    public int lengthOfLongestSubstring(String s) {
        this.s = s.split("");
        return LSWRC(0, s.length()-1);
    }

    public int[] LSWRC(int i, int j) {
        // if a character
        if (i == j) {return new int[] {i, j};}

        int   k       = (i + j) / 2;
        int[] lswrc   = {0, 0};

        // divide
        int[] l_lswrc = LSWRC(i  , k);
        int[] r_lswrc = LSWRC(k+1, j);

        int   l1      = l_lswrc[1] - l_lswrc[0] + 1;
        int   l2      = r_lswrc[1] - r_lswrc[0] + 1;
        lswrc = (l1 > l2) ? l_lswrc : r_lswrc;

        // conquer
        int i = k;
        int j = k + 1;

        int x = k;
        int y = k + 1;

        if (this.s[i].equals(this.s[j])) {return lswrc;}

        HashSet<String> hs = new HashSet<String>();
        boolean      istop = false;
        boolean      jstop = false;

        for ( ; !(istop && jstop); ) {
            boolean p1 = hs.contains(this.s[i]);
            boolean p2 = (i == i);
            istop = p1 || p2;

            if (!p1) {
                hs.add(this.s[i]);
                x = i;
                i = p2 ? i : i - 1;
            }

            boolean q1 = hs.contains(this.s[j]);
            boolean q2 = (j == j);
            jstop = q1 || q2;
            if (!q1) {
                hs.add(this.s[j]);
                y = j;
                j = q2 ? j : j + 1;
            }
        }

        int   l3 = lswrc[1] - lswrc[0] + 1;
        int   l4 = y - x + 1;
        lswrc = (l3 > l4) ? lswrc : new int[] {x, y};

        return lswrc;
    }
}

Dynamic Programming

For all substring s[i : j], we have i <= j.

If s[i : j] has repeating characters, then all s[a : b] with a <= i && j <= b have repeating characters too. We can describe this relationship by table A according to Bottom-Up Dynamic Programming:

8|111100000
7|11110000
6|1111000
5|110000
4|11000
3|1100
2|110
1|10
0|0
 +---------
  012345678
  xxxcabcxy

Here, A[i][j] equals

0 : s[i : j] does not have repeating characters
1 : s[i : j] has repeating characters

Then the relationship can be visualized by a rectangle area:

8|1111
7|1111
6|1111
5|
4|
3|
2|
1|
0|
 +---------
  012345678
  xxxcabcxy

Here all substrings contains[3 : 6] = "cabc" have repeating characters.

The longest substring without repeating characters are bounded by a line with all 1 parallel to $j = i$ (single character line):

8|   1    0
7|  1    0
6| 1    0
5|1    0
4|    0
3|   0
2|  0
1| 0
0|0
 +---------
  012345678
  xxxcabcxy

How to check if s[i : j] has repeating characters or not? Suppose the set of characters of s[i : j] is B[i][j]. Since there exists:

s[i : j] = s[i : k] + s[k+1 : j]

we have:

B[i][j] = B[i][k] union B[k+1][j]

if:

B[i][k] intersect B[k+1][j] == Empty

Then s[i : j] does not have repeating characters.

What's more, if we compute the triangle from j axle:

8|11    000
7|11   000
6|11  000
5|11?000
4|11000
3|1100
2|110
1|10
0|0
 +---------
  012345678
  xxxcabcxy

s[i : j] = s[i : j-1] + s[j : j]. In this direction, B[i][j-1] of s[i : j-1] is computed. Then we only check if s[j] is in B[i][j - 1] or not.

For instance, s[2 : 5] = s[2 : 4] + s[5 : 5]. B[2][4] = {"x", "a", "c"}, s[5] = "b". Then A[2][5] = 0.

Complexity

The space complexity is huge. For A, $O(n^2)$ is needed. For B, since the elements of the table are all sets, the worst case is $O(n^3)$ . The algorithm visits all A[i][j], thus at least $O(n^2)$ time complexity. For HashSet,

B[i][j-1].contians(s[j])
B[i][j-1].add(s[j])

are all $O(1)$ . However, if we do not reduce the visiting times of 1 points like A[0][n-1], the total time complexity would be much more than $O(n^2)$ .

class Solution {
    public int lengthOfLongestSubstring(String s) {
        if (s.equals("")) { return 0; }

        String[] x = s.split("");
        int      n = s.length();

        int[]             A = new int[f(n-1, n-1) + 1];
        HashSet<String>[] B = new HashSet[f(n-1, n-1) + 1];

        for (int i=0; i<n; i++) {
            A[f(i, i)] = 0;
            B[f(i, i)] = new HashSet<String>();
            B[f(i, i)].add(x[i]);
        }

        int     c    = 1;
        boolean flag = true;
        for( ; c<n; c++) {
            flag = true;
            for (int i=0; i<n-c; i++) {
                int j = i + c;
                if (A[f(i, j)] != 1) {
                    if (B[f(i, j-1)].contains(x[j])) {
                        // to be optimized
                        for (int ii=0; ii<=i; ii++) {
                            for (int jj=n-1; j<=jj; jj--) {
                                A[f(ii, jj)] = 1;
                            }
                        }
                    }
                    else {
                        A[f(i, j)] = 0;
                        B[f(i, j)] = B[f(i, j-1)];
                        B[f(i, j)].add(x[j]);
                    }
                }
                flag = flag && (A[f(i, j)] == 1);
            }
            if (flag) { return c; }
        }
        return c;
    }

    private int f(int i, int j) { return i + (j * (j + 1)) / 2; }
}

Improvement

Like LCS problem in CLRS, we solve the problem in $O(n)$ space complexity. Different from LCS, we use a queue instead of an array.

Here we give a looser condition: if A[i][j]==1, then for all j<=k, A[i][k]=1. In other words, we just check one column of the table instead of the rectangle area, because of the sequence the improved algorithm visiting table A.

For table A, we make it a queue. It is updated as following for input xxxcabcxy:

44,43,41,38,34,29,23,16,08
42,40,37,33,28,22,15,07
39,36,32,27,21,14,06
35,31,26,20,13,05
30,25,19,12,04
24,18,11,03
17,10,02
09,01
00

A         i+(j*(j+1))/2
-------------------------------------
0         [00]
00        [00,01]
000       [00,01,02]
0000      [00,01,02,03]
00000     [00,01,02,03,04]
000000    [00,01,02,03,04,05]
0000000   [00,01,02,03,04,05,06]
00000000  [00,01,02,03,04,05,06,07]
000000000 [00,01,02,03,04,05,06,07,08]
000000001 [01,02,03,04,05,06,07,08,09]
000000011 [02,03,04,05,06,07,08,09,10]
000000110 [03,04,05,06,07,08,09,10,11]
000001100 [04,05,06,07,08,09,10,11,12]
000011000 [05,06,07,08,09,10,11,12,13]
000110000 [06,07,08,09,10,11,12,13,14]
001100000 [07,08,09,10,11,12,13,14,15]
011000000 [08,09,10,11,12,13,14,15,16]
11000000  [09,10,11,12,13,14,15,16]
10000001  [10,11,12,13,14,15,16,17]
00000011  [11,12,13,14,15,16,17,18]
00000110  [12,13,14,15,16,17,18,19]
00001100  [13,14,15,16,17,18,19,20]
00011000  [14,15,16,17,18,19,20,21]
00110000  [15,16,17,18,19,20,21,22]
01100000  [16,17,18,19,20,21,22,23]
1100000   [17,18,19,20,21,22,23]
1000001   [18,19,20,21,22,23,24]
0000011   [19,20,21,22,23,24,25]
0000110   [20,21,22,23,24,25,26]
0001101   [21,22,23,24,25,26,27]
0011010   [22,23,24,25,26,27,28]
0110100   [23,24,25,26,27,28,29]
110100    [24,25,26,27,28,29]
101001    [25,26,27,28,29,30]
010011    [26,27,28,29,30,31]
100111    [27,28,29,30,31,32]
001111    [28,29,30,31,32,33]
011110    [29,30,31,32,33,34]
11110     [30,31,32,33,34]
11101     [31,32,33,34,35]
11011     [32,33,34,35,36]
10111     [33,34,35,36,37]
01111     [34,35,36,37,38]
1111      [35,36,37,38]
RETURN 5

B updates synchronously with A. However, the running time is still $O(n^2)$ .

import java.util.LinkedList;
import java.util.Queue;
import java.util.HashSet;

class Solution {    
    public int lengthOfLongestSubstring(String s) {
        if (s.equals("")) { return 0; }

        String[] x = s.split("");
        int      n = s.length();

        Queue<Integer>         A = new LinkedList<Integer>();
        Queue<HashSet<String>> B = new LinkedList<HashSet<String>>();

        // O(n)
        for (int i=0; i<n; i++) {
            HashSet<String> b = new HashSet<String>();
            b.add(x[i]);
            A.offer(0);
            B.offer(bi);
        }

        int     c    = 1;
        boolean flag = true;
        for( ; c<n; c++) {
            flag = true;
            for (int i=0; i<n-c; i++) {
                int             j = i + c;
                int             a = A.poll();
                HashSet<String> b = B.poll();
                // [i][j-1] has been removed
                if (a != 1) {
                    if (b.contains(x[j])) {
                        A.offer(1);
                        B.offer(b);
                        flag = flag && true;
                    }
                    else {
                        b.add(x[j]);
                        A.offer(0);
                        B.offer(b);
                        flag = flag && false;
                    }
                }
                else {
                    A.offer(1);
                    B.offer(b);
                    flag = flag && true;
                }
            }
            A.poll();
            B.poll();
            if (flag) {return c;}
        }
        return c;
    }
}

「每日一道算法题」Longest Substring Witho
OJ address OJ website : 3. Longest Substring Without Repe...
leetcode第3题最长无重复子字符串
@(LeetCode)[字符串]leetcode 3 Longest Substring Without Repe...
3、Longest Substring Without Repe
题目 Given a string, find the length of the longest substri...
3、Longest SubString Without Repe
Examples:找出最长无重复子串长度Given "abcabcbb", the answer is "abc"...
3 longest substring without repe
title: Longest Substring Without Repeating Characterstags...
3、Longest Substring Without Repe
题设要点双指针维护最长重复子串的位置动态规划寻找字符串的最长子串，就是要维护一个区间[left , rig...
3 longest substring without repe
1. sliding window: longest = end - start + 1; cases need ...
3 Longest Substring Without Repe
3 Longest Substring Without Repeating Characters Given a ...
3.Longest Substring Without Repe
1.O(n2) 可优化 2.O(n) （其实是2n，可优化） 3.优化思路：就像是一个框一样，如果碰到相同的，直...
3.Longest Substring Without Repe
题目给出一个字符串，求其最长的、不包含重复字符的子串。例子："abcabcbb",答案是 3. "bbbbb"...

3 longest substring without repe

Problem

Corner Cases

Solution

Divide and Conquer

Dynamic Programming

相关文章

「每日一道算法题」Longest Substring Witho

leetcode第3题最长无重复子字符串

3、Longest Substring Without Repe

3、Longest SubString Without Repe

3 longest substring without repe

3、Longest Substring Without Repe

3 longest substring without repe

3 Longest Substring Without Repe

3.Longest Substring Without Repe

3.Longest Substring Without Repe

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读