美文网首页
Disjoint Set Union (DSU) 并查集及其应用

Disjoint Set Union (DSU) 并查集及其应用

作者: 专职跑龙套 | 来源:发表于2018-03-26 18:15 被阅读280次

关于我的 Leetcode 题目解答,代码前往 Github:https://github.com/chenxiangcyr/leetcode-answers


Disjoint Set Union (DSU) 并查集

并查集是一种非常精巧而实用的数据结构,它主要用于处理一些不相交集合的合并问题。
一些常见的用途有:

  • 求连通子图
  • 求最小生成树的 Kruskal 算法
  • 求最近公共祖先(Least Common Ancestors, LCA)等。

使用并查集时,首先会存在一组不相交的动态集合 S={S1,S2,⋯,Sk}

每个集合可能包含一个或多个元素,并选出集合中的某个元素作为代表。每个集合中具体包含了哪些元素是不关心的,具体选择哪个元素作为代表一般也是不关心的。

我们关心的是,对于给定的元素,可以很快的找到这个元素所在的集合(的代表),以及合并两个元素所在的集合,而且这些操作的时间复杂度都是常数级的。

并查集的基本操作有三个:

  • makeSet(s):建立一个新的并查集,其中包含 s 个单元素集合。
  • unionSet(x, y):把元素 x 和元素 y 所在的集合合并,要求 x 和 y 所在的集合不相交,如果相交则不合并。
  • find(x):找到元素 x 所在的集合的代表,该操作也可以用于判断两个元素是否位于同一个集合,只要将它们各自的代表比较一下就可以了。

并查集的实现原理也比较简单,就是使用树来表示集合,树的每个节点就表示集合中的一个元素,树根对应的元素就是该集合的代表,如图所示:

并查集的树表示

图中有两棵树,分别对应两个集合,其中第一个集合为 {a,b,c,d}

树的节点表示集合中的元素,指针表示指向父节点的指针,根节点的指针指向自己,表示其没有父节点。沿着每个节点的父节点不断向上查找,最终就可以找到该树的根节点,即该集合的代表元素。

现在,应该可以很容易的写出 makeSet 和 find 的代码了,假设使用一个足够长的数组来存储树节点,那么 makeSet 要做的就是构造出如图的森林,其中每个元素都是一个单元素集合,即父节点是其自身。


构造并查集初始化
const int MAXSIZE = 500;
int uset[MAXSIZE];
 
void makeSet(int size) {
    for(int i = 0;i < size;i++) uset[i] = i;
}

接下来,就是 find 操作了,如果每次都沿着父节点向上查找,那时间复杂度就是树的高度,完全不可能达到常数级。这里需要应用一种非常简单而有效的策略:路径压缩。

路径压缩:就是在每次查找时,令查找路径上的每个节点都直接指向根节点,如图所示。

路径压缩
// 递归版本
int find(int x) {
    if (x != uset[x]) uset[x] = find(uset[x]);
    return uset[x];
}

// 非递归版本
int find(int x) {
    int p = x, t;
    while (uset[p] != p) p = uset[p];
    while (x != p) { t = uset[x]; uset[x] = p; x = t; }
    return x;
}

最后是合并操作 unionSet,并查集的合并也非常简单,就是将一个集合的树根指向另一个集合的树根,如图所示。


并查集的合并

这里也可以应用一个简单的启发式策略:按秩合并。该方法使用秩来表示树高度的上界,在合并时,总是将具有较小秩的树根指向具有较大秩的树根。简单的说,就是总是将比较矮的树作为子树,添加到较高的树中。为了保存秩,需要额外使用一个与 uset 同长度的数组,并将所有元素都初始化为 0。

void unionSet(int x, int y) {
    if ((x = find(x)) == (y = find(y))) return;
    if (rank[x] > rank[y]) uset[y] = x;
    else {
        uset[x] = y;
        if (rank[x] == rank[y]) rank[y]++;
    }
}

除了按秩合并,并查集还有一种常见的策略:按集合中元素个数合并,将包含节点较少的树根,指向包含节点较多的树根。这个策略与按秩合并的策略类似,同样可以提升并查集的运行速度,而且省去了额外的 rank 数组。

这样的并查集具有一个略微不同的定义,即若 uset 的值是正数,则表示该元素的父节点(的索引);若是负数,则表示该元素是所在集合的代表(即树根),而且值的相反数即为集合中的元素个数。相应的代码如下所示:
如果要获取某个元素 x 所在集合包含的元素个数,可以使用 -uset[find(x)] 得到。

const int MAXSIZE = 500;
int uset[MAXSIZE];
 
void makeSet(int size) {
    for(int i = 0;i < size;i++) uset[i] = -1;
}
int find(int x) {
    if (uset[x] < 0) return x;
    uset[x] = find(uset[x]);
    return uset[x];
}
void unionSet(int x, int y) {
    if ((x = find(x)) == (y = find(y))) return;
    if (uset[x] < uset[y]) {
        uset[x] += uset[y];
        uset[y] = x;
    } else {
        uset[y] += uset[x];
        uset[x] = y;
    }
}

时间复杂度

Statement: If m operations, either Union or Find, are applied to n elements, the total run time is O(m * logn)
证明参见:https://en.wikipedia.org/wiki/Proof_of_O(log*n)_time_complexity_of_union%E2%80%93find

LeeCode题目

LeetCode题目:684. Redundant Connection
In this problem, a tree is an undirected 无向图 graph that is connected and has no cycles.
The given input is a graph that started as a tree with N nodes (with distinct values 1, 2, ..., N), with one additional edge added. The added edge has two different vertices chosen from 1 to N, and was not an edge that already existed.

The resulting graph is given as a 2D-array of edges. Each element of edges is a pair [u, v] with u < v, that represents an undirected edge connecting nodes u and v.

Return an edge that can be removed so that the resulting graph is a tree of N nodes. If there are multiple answers, return the answer that occurs last in the given 2D-array. The answer edge [u, v] should be in the same format, with u < v.

Example 1:
Input: [[1,2], [1,3], [2,3]]
Output: [2,3]
Explanation: The given undirected graph will be like this:


Example 1

Example 2:
Input: [[1,2], [2,3], [3,4], [1,4], [1,5]]
Output: [1,4]
Explanation: The given undirected graph will be like this:


Example 2

Note:

  • The size of the input 2D-array will be between 3 and 1000.
  • Every integer represented in the 2D-array will be between 1 and N, where N is the size of the input array.
class Solution {
    public int[] findRedundantConnection(int[][] edges) {
        int[] parent = new int[2001];
        
        // makeSet(s):建立一个新的并查集
        for (int i = 0; i < parent.length; i++) parent[i] = i;
        
        for (int[] edge: edges){
            int f = edge[0], t = edge[1];
            
            // 判断两个元素是否位于同一个集合,只要将它们各自的代表比较一下就可以了
            if (find(parent, f) == find(parent, t)) {
                return edge;
            }
            else {
                unionSet(parent, f, t);
            }
        }
        
        return new int[2];
    }
    
    // find(x):找到元素 x 所在的集合的代表
    private int find(int[] parent, int f) {
        // 路径压缩
        if (f != parent[f]) {
          parent[f] = find(parent, parent[f]);
        }
        
        return parent[f];
    }
    
    // unionSet(x, y):把元素 x 和元素 y 所在的集合合并,要求 x 和 y 所在的集合不相交,如果相交则不合并。
    private void unionSet(int[] parent, int x, int y) {
        if ((x = find(parent, x)) == (y = find(parent, y))) return;
        
        parent[x] = y;
    }
}

LeetCode题目:685. Redundant Connection II
In this problem, a rooted tree is a directed 有向图 graph such that, there is exactly one node (the root) for which all other nodes are descendants of this node, plus every node has exactly one parent, except for the root node which has no parents.

The given input is a directed graph that started as a rooted tree with N nodes (with distinct values 1, 2, ..., N), with one additional directed edge added. The added edge has two different vertices chosen from 1 to N, and was not an edge that already existed.

The resulting graph is given as a 2D-array of edges. Each element of edges is a pair [u, v] that represents a directed edge connecting nodes u and v, where u is a parent of child v.

Return an edge that can be removed so that the resulting graph is a rooted tree of N nodes. If there are multiple answers, return the answer that occurs last in the given 2D-array.

Example 1:
Input: [[1,2], [1,3], [2,3]]
Output: [2,3]
Explanation: The given directed graph will be like this:


Example 1

Example 2:
Input: [[1,2], [2,3], [3,4], [4,1], [1,5]]
Output: [4,1]
Explanation: The given directed graph will be like this:


Example 2

Note:

  • The size of the input 2D-array will be between 3 and 1000.
  • Every integer represented in the 2D-array will be between 1 and N, where N is the size of the input array.
class Solution {
    public int[] findRedundantDirectedConnection(int[][] edges) {
        int[] parent = new int[edges.length];
        
        // makeSet(s):建立一个新的并查集
        for (int i = 0; i < edges.length; i++) parent[i] = i;

        int[] candidate1 = null, candidate2 = null;
        
        for (int[] edge: edges){
            int rootx = find(parent, edge[0] - 1);
            int rooty = find(parent, edge[1] - 1);
            
            if (rootx != rooty) {
                // record the last edge which results in "multiple parents" issue
                if (rooty != edge[1]-1) {
                    candidate1 = edge;
                }
                else {
                    unionSet(parent, edge[1] - 1, edge[0] - 1);
                }
            }
            else {
                // record last edge which results in "cycle" issue, if any.
                candidate2 = edge;
            }
                
        }

        // if there is only one issue, return this one.
        if (candidate1 == null) return candidate2; 
        if (candidate2 == null) return candidate1;
        
        // If both issues present, then the answer should be the first edge which results in "multiple parents" issue
        // Could use map to skip this pass, but will use more memory.
        for (int[] e : edges) {
            if (e[1] == candidate1[1]) {
                return e;
            }
        }

        return new int[2];
    }

    // find(x):找到元素 x 所在的集合的代表
    private int find(int[] parent, int f) {
        // 路径压缩
        if (f != parent[f]) {
          parent[f] = find(parent, parent[f]);
        }
        
        return parent[f];
    }
    
     // unionSet(x, y):把元素 x 和元素 y 所在的集合合并,要求 x 和 y 所在的集合不相交,如果相交则不合并。
    private void unionSet(int[] parent, int x, int y) {
        if ((x = find(parent, x)) == (y = find(parent, y))) return;
        
        parent[x] = y;
    }
}

LeetCode题目:261. Graph Valid Tree
Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to check whether these edges make up a valid tree.

For example:
Given n = 5 and edges = [[0, 1], [0, 2], [0, 3], [1, 4]], return true.
Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [1, 3], [1, 4]], return false.

Note: you can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1] is the same as [1, 0] and thus will not appear together in edges.

class Solution {
    public boolean validTree(int n, int[][] edges) {
        // initialize n isolated islands
        int[] nums = new int[n];
        for(int i = 0; i < nums.length; i++) {
            nums[i] = i;
        }
        
        // perform union find
        for (int i = 0; i < edges.length; i++) {
            int x = find(nums, edges[i][0]);
            int y = find(nums, edges[i][1]);
            
            // if two vertices happen to be in the same set
            // then there's a cycle
            if (x == y) return false;
            
            // union
            nums[x] = y;
        }
        
        return edges.length == n - 1;
    }
    
    public int find(int nums[], int i) {
        if(i != nums[i]) {
            nums[i] = find(nums, nums[i]);
        }
        
        return nums[i];
    }
}

LeetCode题目:305. Number of Islands II
A 2d grid map of m rows and n columns is initially filled with water. We may perform an addLand operation which turns the water at position (row, col) into a land. Given a list of positions to operate, count the number of islands after each addLand operation. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

Example:
Given m = 3, n = 3, positions = [[0,0], [0,1], [1,2], [2,1]].
Initially, the 2d grid grid is filled with water. (Assume 0 represents water and 1 represents land).

0 0 0
0 0 0
0 0 0

Operation #1: addLand(0, 0) turns the water at grid[0][0] into a land.

1 0 0
0 0 0 Number of islands = 1
0 0 0

Operation #2: addLand(0, 1) turns the water at grid[0][1] into a land.

1 1 0
0 0 0 Number of islands = 1
0 0 0

Operation #3: addLand(1, 2) turns the water at grid[1][2] into a land.

1 1 0
0 0 1 Number of islands = 2
0 0 0

Operation #4: addLand(2, 1) turns the water at grid[2][1] into a land.

1 1 0
0 0 1 Number of islands = 3
0 1 0

We return the result as an array: [1, 1, 2, 3]

Challenge:

  • Can you do it in time complexity O(k log mn), where k is the length of the positions?
class Solution {
    int[][] dirs = {{0, 1}, {1, 0}, {-1, 0}, {0, -1}};

    public List<Integer> numIslands2(int m, int n, int[][] positions) {
        List<Integer> result = new ArrayList<>();
        if(m <= 0 || n <= 0) return result;

        int count = 0;
        
        /*
        使用DSU并查集
        */
        // one island = one tree
        int[] roots = new int[m * n];
        Arrays.fill(roots, -1);

        for(int[] p : positions) {
            // 该位置对应二维数组的编号
            int curIdx = n * p[0] + p[1];
            
            // add new island
            roots[curIdx] = curIdx;
            
            count++;
            
            for(int[] dir : dirs) {
                // 遍历四个方向
                int x = p[0] + dir[0]; 
                int y = p[1] + dir[1];
                int neighbourIdx = n * x + y;
                
                // 边界检测
                if(x < 0 || x >= m || y < 0 || y >= n) continue;

                // 如果邻居不是岛屿则忽略
                if(roots[neighbourIdx] == -1) continue;
                
                // 邻居岛屿的root
                int neighbourRoot = find(roots, neighbourIdx);
                
                // if neighbor is in another island
                if(roots[curIdx] != neighbourRoot) {
                    // union two islands
                    roots[curIdx] = neighbourRoot;
                    
                    // current tree root = joined tree root
                    curIdx = neighbourRoot;
                    
                    count--;
                }
            }

            result.add(count);
        }
        
        return result;
    }

    public int find(int[] roots, int id) {
        if(id != roots[id]) {
            roots[id] = find(roots, roots[id]);
        }
        
        return roots[id];
    }
}

引用:
并查集(Disjoint Set)

相关文章

  • Disjoint Set Union (DSU) 并查集及其应用

    关于我的 Leetcode 题目解答,代码前往 Github:https://github.com/chenxia...

  • union find

    参考 并查集(disjoint set)的实现及应用

  • 并查集

    并查集 (Disjoint Set Union) 是一种树形的数据结构,用于处理不交集的合并 (union) 及查...

  • Disjoint Set Union (DSU) structu

    Sentence Similarity II Given two sentences words1, words2...

  • Union-Find并查集详解

    Union find(并查集)是Disjoint set data type(不相交集合数据结构)的一种。 首先简...

  • 接触并查集结构

    概述 并查集(Disjoint set或者Union-find set)是一种树型的数据结构(一定要一次性给定数据...

  • 并查集问题

    并查集(Union-find or Disjoint-set)问题是一个很有趣现实中很常见的问题,也并不是一个能够...

  • Union-Find algorithm-并查集算法

    并查集 并查集(Union-Find Set),也称为不相交集数据结构(Disjointed Set Data S...

  • 算法(02)并查集

    并查集也叫作不相交集合(Disjoint Set) 并查集有2个核心操作 (Find):查找元素所在的集合(这里的...

  • 第七周算法总结

    1.字典集 手动实现trie类 search方法 insert 方法 2.并查集 disjoint set 判断是...

网友评论

      本文标题:Disjoint Set Union (DSU) 并查集及其应用

      本文链接:https://www.haomeiwen.com/subject/nqircftx.html