标签传播算法（LPA）原理及java实现

作者: 空尘AI | 来源:发表于2019-05-07 15:25 被阅读0次

标签传播算法（LPA）原理及java实现
LPA算法在GraphX中的实现
社区发现算法-标签传播
graphX 实现LPA算法，Java 版本
JVM中GC类型及参数设置
2017年初BAT的JAVA面试题汇集
2017年末，腾讯，百度，华为，搜狗和滴滴面试题汇总
八种排序算法原理及Java实现
Java 源码研究之 HashMap
信息安全：数据加密实战！对项目中数据使用MD5算法进行加密

基本概念

社区(community)

网络图内部连接比较紧密的节点子集合对应的子图叫做社区。各社区节点集合彼此没有交集的称为非重叠型社区，有交集的称为重叠型社区。

社区发现(Community Detection)

对给定的网络图寻找其社区结构的过程称为社区发现。是一种聚类的过程。

lpa算法概述

LPA标签传播算法是社区发现算法中最简单易懂的一种，核心思想是近朱者赤。通过图中一个点所在的圈子来判断该点的特征性质，连接紧密的点被划分到同一个社区中，达到物以类聚的效果。

大体流程：在一个由点和边组成的图结构中，通过一个点V的所有邻居对V进行投票，每个邻居把自己社区标签投给V，票数占多数的标签作为V的社区标签（如果最大票数中多个标签持平，则随机选择一个）。使用上述方法对所有的点迭代一轮，则每个点都会被分配到一个社区中（打上了标签）。迭代多轮后、或者在某一轮迭代后社区保持稳定时，连接紧密的点将有共同的社区标签，算法结束。

基本思想

将一个节点的邻居节点的标签中数量最多的标签作为该节点自身的标签，给每个节点添加标签(label）以代表它所属的社区,并通过标签的传播形成同一标签的社区结构。

第一步: 为所有节点指定一个唯一的标签；
第二步: 逐轮刷新所有节点的标签，直到达到收敛要求为止。对于每一轮刷新，节点标签刷新的规则如下:
对于某一个节点，考察其所有邻居节点的标签，并进行统计，将出现个数最多的标签作为自身的标签。当个数最多的标签不唯一时，随机选一个。

优点

收敛周期短，无需任何先验参数（不需要指定社区个数和大小），算法执行过程中不需要计算任何社区指标。

评价标准

社区发现的主要评价指标有Jaccard指数，fsame指数、NMI（规范化交互信息）以及Modularity（模块度）等

Jaccard指数

衡量社区分割正确性的指标，在已知正确划分的情况下通过正确分类的节点对的数量来计量

NMI

依然是已知划分情况下与真实结果差异度的比较指标，其标准差可以衡量算法的稳定性

Modularity（模块度）

网络中连接社区内部边所占的比例与另一网络中的内部边的期望值之间的差值

伪代码：

输入：无向图邻接矩阵AdjacentMatrix，节点个数VerticeNum 
输出：存储节点标签的分类数组Community
//初始化每个节点的标签
For i <- 0 to VerticeNum Do
    Community[i] <- i
    //寻找i节点的所有邻居存入Neighbor[i]
    FindMaetexNonZero(i,AdjacentMatrix,NeighBor[i])
while 未达到分类标准 or 未超出迭代阈值 then
    RandomSort(SS)//生成随机序数队列SS
    For i <- 0 to VerticeNum Do
        //统计节点i邻居中数量最多的标签
        VectorFrequency(Neighbor[i], lable)
        //若只有一个数量最多则直接赋值
        if lable.size() = 1 then
            Community[i] <- lable[0]
        //若有多个相同数量的标签则随机选择一个
        else then
            Community[i] <- lable[random]
return Community

java实现

1.下载安然公司数据集

下载链接：http://snap.stanford.edu/data/email-Enron.html

数据格式如下

2.findDominantLabel.java

import java.util.Collections;
import java.util.Vector;
import java.util.concurrent.Callable;
import java.util.Random;

public class findDominantLabel implements Callable<Boolean> {
    private Vector<Integer> dominant_label;
    private Vector<Integer> label_count;
    private int node_id;
    private Vector<NODE> node_list;
    private Random randGen;

    public findDominantLabel(Vector<NODE> node_list) {
        dominant_label = new Vector<Integer>();
        label_count = new Vector<Integer>(node_list.size());                 //vector of size nodelist
        for (int i = 0; i < node_list.size(); i++) {                          // to prevent Index out of bounds exception
            label_count.add(Integer.valueOf(0));
        }
        randGen = new Random();                                              // to prevent Null point exception
        this.node_list = node_list;

    }

    public void link_node_to_process(int node_id) {
        this.node_id = node_id;
    }

    @Override
    public Boolean call() {
        if (node_id == -1)                                                    // if node Id not present (no node left)
            return Boolean.FALSE;
        boolean run = false;
        Collections.fill(label_count, Integer.valueOf(0));                   // Initializing label_count with same label as the node
        dominant_label.clear();
        NODE curr_node = node_list.get(node_id);
        int maximum_count = 0;
        for (Integer neighborId : curr_node.get_neighbors()) {               // for all neighbors
            int neighbor_label = node_list.get(neighborId).get_label_name(); // get label of the neighbor from node list
            if (neighbor_label == 0)
                continue;                                                    // no label assignment yet, just the initialization done so far
            int neighbor_label_count = label_count.get(neighbor_label) + 1; //total neighbors wih same label (neighbor_label)+ itself (1)
            label_count.set(neighbor_label, neighbor_label_count);          //replace the neighbor_label(index_num) with its total count in the neighborhood
            if (maximum_count < neighbor_label_count) {                      //some other neighbor label has max count
                maximum_count = neighbor_label_count;                          //replace max count by that neighbor label count
                dominant_label.clear();                                      //clear the previous dominant label
                dominant_label.add(neighbor_label);                          //make neighbor_label as the new dominant label (since it has max count)
            } else if (maximum_count == neighbor_label_count) {
                dominant_label.add(neighbor_label);
            }
        }
        if (dominant_label.size() > 0) {                                        // more than 1 dominant label among the neighbors
            //int rand = randGen.nextInt(dominant_label.size());
            //rand = dominant_label.get(rand);
            Integer maximum_label = Collections.max(dominant_label);            // choose the maximum label
            if (label_count.get(curr_node.get_label_name()) != maximum_count) {  //is the current label dominant label
                run = true;                                                      //current label not dominant
            }
            curr_node.set_label_name(maximum_label);
        }
        return Boolean.valueOf(run);
    }
}

3.LP.java

import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.HashSet;
import java.util.Set;
import java.util.Vector;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;


class NODE {
    int label_name;
    int id;
    Set<Integer> neighbors;

    public NODE(int id, int label_name) {
        this.label_name = label_name;
        this.id = id;
        this.neighbors = new HashSet<Integer>();
    }

    /*-************************* Node methods defined ***********************-*/
    public int get_id() {
        return id;
    }

    public void set_label_name(int label_name) {
        this.label_name = label_name;
    }

    public int get_label_name() {
        return label_name;
    }

    public void set_neighbors(Set<Integer> neighbors) {
        this.neighbors = neighbors;
    }

    public void append_neighbors(int id) {
        this.neighbors.add(Integer.valueOf(id));
    }

    public Set<Integer> get_neighbors() {
        return neighbors;
    }
}

/*-****************************** main class *****************************/
/* Read input file, write output file, Community Detection Algorithm     */
/*-***********************************************************************/

public class LP {
    Vector<NODE> node_list;
    Vector<Integer> ordered_nodes;

    public LP() { //default constructor
    }

    /*-************************** Read input file ***************************-*/
    /*Format : "node_id node_id" and input file has edge list of the graph.   */
    /*-**********************************************************************-*/

    public void readinput(int total_nodes, String input_file) throws IOException {
        FileReader input = new FileReader(input_file);
        BufferedReader br = new BufferedReader(input);
        node_list = new Vector<NODE>(total_nodes);
        ordered_nodes = new Vector<Integer>(total_nodes);

        for (int i = 0; i < total_nodes; i++) {
            node_list.add(new NODE(i, i));                       //adding all the nodes to the node list
            ordered_nodes.add(i);                               //Preserving order of nodes
        }
        System.out.println(total_nodes + " nodes added.");
        String input_lines = br.readLine();
        while (input_lines != null) {
            String[] split = input_lines.split(" ");        //spliting each lines by spaces
            int v1 = Integer.valueOf(split[0]);            //first half be node v1
            int v2 = Integer.valueOf(split[1]);            //second half be node v2
            node_list.get(v1).append_neighbors(v2);        //add v2 in the neighborlist of v1
            node_list.get(v2).append_neighbors(v1);        //add v1 in the neighborlist of v2
            input_lines = br.readLine();                    //next line
        }
        /*-**************************Neighbor list complete ***********************/
        br.close();
    }

    /*-************************** Write output file ***************************-*/
    /*                       Format : "node_id community label"                 */
    /*-************************************************************************-*/

    public void final_communities(String file) throws IOException {
        Map<Integer, Integer> assign_label = new HashMap<Integer, Integer>();
        int label_count = 0;
        for (int i = 0; i < node_list.size(); i++) {
            int label = node_list.get(i).get_label_name();
            Integer r = assign_label.get(Integer.valueOf(label));
            if (r == null) {
                label_count++;
                assign_label.put(Integer.valueOf(label), Integer.valueOf(label_count));
            }
        }
        System.out.println("communities = " + label_count);
        /* label_count communities found */
        FileOutputStream fso = new FileOutputStream(file);
        OutputStreamWriter fileWriter = new OutputStreamWriter(fso, Charset.forName("UTF-8"));
        NODE node;
        for (int i = 0; i < node_list.size(); i++) {
            node = node_list.get(i);
            fileWriter.write(node.get_id() + "--" + assign_label.get(Integer.valueOf(node.get_label_name())).intValue() + "\n");
        }
        System.out.println("DONE");
        fileWriter.close();
        fso.close();
    }


    /*-************************** Community Detection ***********************-*/
    /*Multi-Threading can also be adapted to multi-processor architecture     */
    /*-**********************************************************************-*/

    public void communityDetection(int total_threads) throws IOException, ExecutionException, InterruptedException {
        ExecutorService threadPool = Executors.newFixedThreadPool(total_threads);
        Vector<findDominantLabel> dominantLabel_Calc = new Vector<findDominantLabel>(total_threads);
        for (int i = 0; i < total_threads; i++) {
            dominantLabel_Calc.add(new findDominantLabel(node_list));
        }
        //int iteration = 0;
        int label_change = 100; //number of nodes change labels (unstable configuration)
        while (label_change > 0) {
            label_change = 0;
            Collections.shuffle(ordered_nodes);
            //PARALLELISM
            for (int i = 0; i < node_list.size(); i += total_threads) {    //for all nodes
                for (int j = 0; j < total_threads; j++) {                   //blocks of total threads number of nodes run paralley together
                    if ((i + j) < node_list.size())                            // if there are enough nodes(= number of threads) to run parallely
                   /*pull each of the j threads from vector (dominantLabel_Calc) and link
                     each node from the ordered list to a thread (one to one mapping) */
                        dominantLabel_Calc.get(j).link_node_to_process(ordered_nodes.get(i + j).intValue());
                    else
                        dominantLabel_Calc.get(j).link_node_to_process(-1);
                }
                List<Future<Boolean>> result = threadPool.invokeAll(dominantLabel_Calc);
                for (int k = 0; k < result.size(); k++) {
                    Boolean b = result.get(k).get();
                    if (b != null && b.booleanValue() == true) {
                        label_change++;
                        if (label_change == 1)
                            // System.out.print("once more");
                            break;
                    }
                }
            }
        }
        System.out.println("Communities found");
        threadPool.shutdown();
    }

    /*-************************** Main***************************************-*/
    /*Change number of threads here, (also other parameters)                  */
    /*-**********************************************************************-*/


    public static void main(String[] args) throws IOException, InterruptedException, ExecutionException {
        LP algo = new LP();
        int total_nodes = 37000;                        // Total nodes in the input graph
        int total_threads = 12;                         // Total threads to use

        algo.readinput(total_nodes, "E:\\yishuheWorkspace\\graph-ysh\\src\\main\\java\\com\\graph\\service\\lpa\\Enron_email_dataset");
        long startTime = System.currentTimeMillis();
        algo.communityDetection(total_threads);
        long endTime = System.currentTimeMillis();
        long totalTime = endTime - startTime;

        System.out.println("Time_Taken" + totalTime / 1000.0);
        algo.final_communities("Pooya.txt");
    }
}

空尘AI简书目录

标签传播算法（LPA）原理及java实现
基本概念社区(community) 网络图内部连接比较紧密的节点子集合对应的子图叫做社区。各社区节点集合彼此没有...
LPA算法在GraphX中的实现
LPA：标签传播算法。 Label Propagation，是一种基于图的半监督学习算法(Semi-supervi...
社区发现算法-标签传播
简介基本的标签传播算法(LPA)[1]的思想非常简单，就是让每个结点与它的大多数邻居在同一个社区中。具体算法流程...
graphX 实现LPA算法，Java 版本
point InitMsg，各个节点的id 作为初始化标签 sendMsg, 沿着outEdge sendMsg...
JVM中GC类型及参数设置
本文内容为JVM垃圾收集器及算法,并不涉及具体方法实现原理,如需深入了解算法原理,请参考《深入理解Java虚拟机》...
2017年初BAT的JAVA面试题汇集
Java基础● 集合类以及集合框架；HashMap与HashTable实现原理，线程安全性，hash冲突及处理算法...
2017年末，腾讯，百度，华为，搜狗和滴滴面试题汇总
Java基础集合类以及集合框架；HashMap与HashTable实现原理，线程安全性，hash冲突及处理算法；...
八种排序算法原理及Java实现
八种排序算法原理及Java实现概述排序算法分为内部排序和外部排序，内部排序把数据记录放在内存中进行排序，而外部...
Java 源码研究之 HashMap
本文是在观看 Java HashMap 工作原理及实现后,虽然大致了解了 HashMap 的工作原理及实现,但是...
信息安全：数据加密实战！对项目中数据使用MD5算法进行加密
MD5加密算法的实现原理 Java中MD5加密算法的实现: MessageDigest类 MessageDiges...