139. Word Break

作者: Super_Alan | 来源:发表于2018-05-01 12:31 被阅读0次

Leetcode动态规划II
139. Word Break
139. Word Break
139. Word Break
139. Word Break
139. Word Break
139. Word Break
139. Word Break
139. Word Break
139. Word Break

题目：Check a given string is breakable to words or not according to the given dictionary.

Naive Solution: DFS

基本思路就是看是否可以找到一个 path to break words in the string to the end.

public boolean wordBreak(String s, List<String> wordDict) {
    HashSet<String> dict = new HashSet<>(wordDict);
    return dfs(0, s, dict);
}

private boolean dfs(int index, String str, Set<String> dict) {
    if (index >= str.length()) {
        return true;
    }

    for (int i = index + 1; i <= str.length(); i++) {
        String possibleWord = str.substring(index, i);
        if (dict.contains(possibleWord)) {
            if (dfs(i, str, dict)) {
                return true;
            }
        }
    }

    return false;
}

这里会涉及到一些重复性的计算，例如："catsand" 是可以分为 "cat, sand" 和 "cats, and"，那么"catsandxxx" ("xxx" not breakable) 中"xxx"部分需要计算两次。

如果 "MMMNNN" 中 "MMM" 存在 m 种分割方式，而 "NNN" not breakable, 那么 "NNN" 部分是需要递归计算 m 次的。如果 "NNN" breakable，那么该题解中只需计算一次，因为到达 end of string 时，stack will be released as true layer by layer.

如果 "MMMNNNXXX" 中 "MMM" 存在 m 种分割方式, "NNN" 存在 n 中分割方式，"XXX" not breakable，那么 "XXX" 部分是需要计算 m * n 次的。

Naive Solution 的优化：Memorization

beated 87% solutions.

public boolean wordBreak(String s, List<String> wordDict) {
    HashSet<String> dict = new HashSet<>(wordDict);
    
    int len = s.length();
    // s.substring(0, i) is breakable or not
    boolean[] breakable = new boolean[len + 1]; 
    
    return dfs(0, s, dict, breakable);
}

private boolean dfs(int index, String str, Set<String> dict, boolean[] breakable) {
    if (index >= str.length()) {
        return true;
    }

    for (int i = index + 1; i <= str.length(); i++) {
        if (breakable[i]) {
            // str 分割为 str.substring(0, i) + str.substring(i) 已经尝试过，
            // 并且 first part breakable, second part not breakable, so we continue
            // 从而避免了冗余计算
            continue;
        }
        String possbileWord = str.subString(index, i);
        if (dict.contains(possbileWord)) {
            breakable[i] = true;
            if (dfs(i, str, dict, breakable)) {
                return true;
            }
        }
    }

    return false;
}

Incorrect: Tried to optimize as below, but not much performance improving.

public boolean wordBreak(String s, List<String> wordDict) {
    HashSet<String> dict = new HashSet<>(wordDict);
    
    int len = s.length();
    boolean[][] breakable = new boolean[len + 1][len + 1];
    
    return dfs(0, s, dict, breakable);
}

private boolean dfs(int index, String str, Set<String> dict, boolean[][] breakable) {
    if (index >= str.length()) {
        return true;
    }

    for (int i = index + 1; i <= str.length(); i++) {
        if (breakable[index][i] || dict.contains(str.substring(index, i))) {
            breakable[index][i] = true;
            if (dfs(i, str, dict, breakable)) {
                return true;
            }
        }
    }

    return false;
}

DP solution

beated 38% solutions.

public boolean wordBreak(String s, List<String> wordDict) {
    Set<String> dict = new HashSet<>(wordDict);
    
    int len = s.length();
    boolean[] breakable = new boolean[len + 1];
    breakable[0] = true;
    
    for (int subEnd = 1; subEnd <= len; subEnd++) {
        for (int index = 0; index < subEnd; index++) {
            if (breakable[index] && dict.contains(s.substring(index, subEnd))) {
                breakable[subEnd] = true;
                break;
            }
        }
    }
    
    return breakable[len];
}

分析：DFS with Memorization performs better than DP in this problem. The reason is that is similar to check a leaf in a binary tree or not. DFS search half size of the tree for the leaf on average. But BFS, layer by layer, needs to search 3/4 of a a complete tree on average.

a complete tree of n layer: node count of layer n == node count sum of layers (1 ~ n-1)

Back to the problem, boolean[] breakable array should be the same result if we don't skip duplicated calculations. But breakable in DFS solution should have equal or fewer true value count than in DP solution, which means DFS solution conducted less or equal calculation than DP solution.

类似的 DP 的思路还可以求取 string 可以被 break 的最多和最少 words 数量。 DFS with memorization 和 DP 都是可以求解的，但是 DP 会优化很多。因为 DFS 是遍历所有 path 最终在 dp[len] 的最小值，而 DP 是根据已经计算的 substring(0, 0) ~ substring(0, i) 的最多/最少划分数目 dp[0] ~ dp[i]，来求取 substring(0, i+1) 的最多/最少划分数目 dp[i+1].