Community can used in networks, also in machine learning graph data.
Why community detection?
Community detection can be used in mach ine learning to detect groups with similar properties and extract groups for various reasons.
Community Detection vs Clustering
Community: Community is a subset of nodes within the graph such that connections between the nodes are denser than connections within the rest of the network.
Community Struture: A network is said to have community structure if the nodes of the network can be easily groupd into sets of nodes such that each set of nodes is densely connected internally.
Community Detection: Community detection reveal the hidden relations among the nodes by detect the communities in a network.
Clustering is a machine learning technique in which similar data points are grouped into the same cluster based on their attributes.
1. Single attribute vs multiple attributes
Community detectionis specially tailored for network analysis which depends on a single attribute type called edges.
Clustering is a broader field in unsupevised machine learning which deals with multiple attribute types.
2. Clustering algorithms have a tendency to separate single peripheral nodes from the communities it should belong to.
3. Both community detection and clustering techniques can be applied to many network analysis problems.
Community Detection Techniques
Community detection methods can be broadly categorized into two types: Agglomerative Methods (Edges are added one by one from the stronger edge to the weaker edge) and Divisive Methods (Edges are removed one by one from a complete graph).
1. Divisive hierarchical Clustering: The divisive hierarchical clustering algorithm is a top-down clustering approach. Initially, all the points in the dataset belong to one cluster and split is performed recursively as one moves down the hierarchy.
Steps:
1. Initially, all the points in the dataset belong to one cluster.
2. Partition the cluster into two least similar clusters.
3. Proceed recursively to form new clusters until the desired number of clusters is obtained.
2. Agglomerative Clustering: It's a bottom-up approach. Initially, each data point is a cluster of its own, futher pairs of clusters are merged as one moves up the hierarchy.
Steps:
1. Initially, all the data points are a cluster of its own.
2. Take two nearest clusters and join them to form one single cluster.
3. Proceed recursively step 2 until you obtain the desired number of clusters.
Community Detection Algorithm
Newman Girvan Algorithm: The Girvan–Newman algorithm detects communities by progressively removing edges from the original network. The end result of the Girvan–Newman algorithm is a dendrogram(tree diagram). As the Girvan–Newman algorithm runs, the dendrogram is produced from the top down. The leaves of the dendrogram are individual nodes.
Steps:
1. Calculate the edge-betweenness for all edges
(1) Using the breadth first search to find the shortest path.
2. Remove the edge with the highest edge-betweenness
3. Re-calculate the edge-betweenness for all edges
4. Steps 2 and 3 are repeated until no edges remain.
Quality Measure
Modularity: Modularity is a measure of the structure of a graph, measuring the density of connections within a module or community.
1. Proportion of edges that connect nodes inside cluster i.
2. Fraction of edge endpoints that belong to nodes in cluster i.
Modularity Random Graph Model:
Random Graph Model:
Modularity Optimization
NP hard
网友评论