美文网首页pytorch
Relational graph convolutional n

Relational graph convolutional n

作者: 魏鹏飞 | 来源:发表于2020-03-15 22:18 被阅读0次

    In this tutorial, you learn how to implement a relational graph convolutional network (R-GCN). This type of network is one effort to generalize GCN to handle different relationships between entities in a knowledge base. To learn more about the research behind R-GCN, see Modeling Relational Data with Graph Convolutional Networks

    Modeling Relational Data with Graph Convolutional Networks

    The straightforward graph convolutional network (GCN) and DGL tutorial) exploits structural information of a dataset (that is, the graph connectivity) in order to improve the extraction of node representations. Graph edges are left as untyped.

    A knowledge graph is made up of a collection of triples in the form subject, relation, object. Edges thus encode important information and have their own embeddings to be learned. Furthermore, there may exist multiple edges among any given pair.

    A brief introduction to R-GCN

    In statistical relational learning (SRL), there are two fundamental tasks:

    • Entity classification - Where you assign types and categorical properties to entities.
    • Link prediction - Where you recover missing triples.

    In both cases, missing information is expected to be recovered from the neighborhood structure of the graph. For example, the R-GCN paper cited earlier provides the following example. Knowing that Mikhail Baryshnikov was educated at the Vaganova Academy implies both that Mikhail Baryshnikov should have the label person, and that the triple (Mikhail Baryshnikov, lived in, Russia) must belong to the knowledge graph.

    R-GCN solves these two problems using a common graph convolutional network. It’s extended with multi-edge encoding to compute embedding of the entities, but with different downstream processing.

    • Entity classification is done by attaching a softmax classifier at the final embedding of an entity (node). Training is through loss of standard cross-entropy.
    • Link prediction is done by reconstructing an edge with an autoencoder architecture, using a parameterized score function. Training uses negative sampling.

    This tutorial focuses on the first task, entity classification, to show how to generate entity representation. Complete code for both tasks is found in the DGL Github repository.

    Key ideas of R-GCN

    Recall that in GCN, the hidden representation for each node i at (l+1)^{th} layer is computed by:

    h_i^{l+1}=\sigma(\sum_{j\in N_i}\frac{1}{c_i}W^{(l)}h_j^{(l)})\tag{1}

    where c_i is a normalization constant.

    The key difference between R-GCN and GCN is that in R-GCN, edges can represent different relations. In GCN, weight W^{(l)} in equation (1) is shared by all edges in layer l. In contrast, in R-GCN, different edge types use different weights and only edges of the same relation type r are associated with the same projection weight W^{(l)}_r.

    So the hidden representation of entities in (l+1)^{th} layer in R-GCN can be formulated as the following equation:

    h_i^{(l+1)}=\sigma(W_0^{(l)}h_i^{(l)}+\sum_{r\in R}\sum_{j\in N_l^r}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)})\tag{2}

    where N^r_i denotes the set of neighbor indices of node i under relation r\in R and c_{i,r} is a normalization constant. In entity classification, the R-GCN paper uses c_{i,r}=|N^r_i|.

    The problem of applying the above equation directly is the rapid growth of the number of parameters, especially with highly multi-relational data. In order to reduce model parameter size and prevent overfitting, the original paper proposes to use basis decomposition.

    W^{(l)}_r=\sum_{b=1}^Ba_{rb}^{(l)}V_b^{(l)}\tag{3}

    Therefore, the weight W^{(l)}_r is a linear combination of basis transformation V^{(l)}_b with coefficients a^{(l)}_{rb}. The number of bases B is much smaller than the number of relations in the knowledge base.

    Note:
    Another weight regularization, block-decomposition, is implemented in the link prediction.

    Implement R-GCN in DGL

    An R-GCN model is composed of several R-GCN layers. The first R-GCN layer also serves as input layer and takes in features (for example, description texts) that are associated with node entity and project to hidden space. In this tutorial, we only use the entity ID as an entity feature.

    R-GCN layers
    For each node, an R-GCN layer performs the following steps:

    • Compute outgoing message using node representation and weight matrix associated with the edge type (message function)
    • Aggregate incoming messages and generate new node representations (reduce and apply function)

    The following code is the definition of an R-GCN hidden layer.

    Note:
    Each relation type is associated with a different weight. Therefore, the full weight matrix has three dimensions: relation, input_feature, output_feature.

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from dgl import DGLGraph
    import dgl.function as fn
    from functools import partial
    
    class RGCNLayer(nn.Module):
        def __init__(self, in_feat, out_feat, num_rels, num_bases=-1, bias=None,
                     activation=None, is_input_layer=False):
            super(RGCNLayer, self).__init__()
            self.in_feat = in_feat
            self.out_feat = out_feat
            self.num_rels = num_rels
            self.num_bases = num_bases
            self.bias = bias
            self.activation = activation
            self.is_input_layer = is_input_layer
    
            # sanity check
            if self.num_bases <= 0 or self.num_bases > self.num_rels:
                self.num_bases = self.num_rels
    
            # weight bases in equation (3)
            self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.in_feat,
                                                    self.out_feat))
            if self.num_bases < self.num_rels:
                # linear combination coefficients in equation (3)
                self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases))
    
            # add bias
            if self.bias:
                self.bias = nn.Parameter(torch.Tensor(out_feat))
    
            # init trainable parameters
            nn.init.xavier_uniform_(self.weight,
                                    gain=nn.init.calculate_gain('relu'))
            if self.num_bases < self.num_rels:
                nn.init.xavier_uniform_(self.w_comp,
                                        gain=nn.init.calculate_gain('relu'))
            if self.bias:
                nn.init.xavier_uniform_(self.bias,
                                        gain=nn.init.calculate_gain('relu'))
        def forward(self, g):
            if self.num_bases < self.num_rels:
                # generate all weights from bases (equation (3))
                weight = self.weight.view(self.in_feat, self.num_bases, self.out_feat)
                weight = torch.matmul(self.w_comp, weight).view(self.num_rels,
                                                            self.in_feat, self.out_feat)
            else:
                weight = self.weight
    
            if self.is_input_layer:
                def message_func(edges):
                    # for input layer, matrix multiply can be converted to be
                    # an embedding lookup using source node id
                    embed = weight.view(-1, self.out_feat)
                    index = edges.data['rel_type'] * self.in_feat + edges.src['id']
                    return {'msg': embed[index] * edges.data['norm']}
            else:
                def message_func(edges):
                    w = weight[edges.data['rel_type']]
                    msg = torch.bmm(edges.src['h'].unsqueeze(1), w).squeeze()
                    msg = msg * edges.data['norm']
                    return {'msg': msg}
    
            def apply_func(nodes):
                h = nodes.data['h']
                if self.bias:
                    h = h + self.bias
                if self.activation:
                    h = self.activation(h)
                return {'h': h}
    
            g.update_all(message_func, fn.sum(msg='msg', out='h'), apply_func)
    

    Full R-GCN model defined

    class Model(nn.Module):
        def __init__(self, num_nodes, h_dim, out_dim, num_rels,
                     num_bases=-1, num_hidden_layers=1):
            super(Model, self).__init__()
            self.num_nodes = num_nodes
            self.h_dim = h_dim
            self.out_dim = out_dim
            self.num_rels = num_rels
            self.num_bases = num_bases
            self.num_hidden_layers = num_hidden_layers
    
            # create rgcn layers
            self.build_model()
    
            # create initial features
            self.features = self.create_features()
    
        def build_model(self):
            self.layers = nn.ModuleList()
            # input to hidden
            i2h = self.build_input_layer()
            self.layers.append(i2h)
            # hidden to hidden
            for _ in range(self.num_hidden_layers):
                h2h = self.build_hidden_layer()
                self.layers.append(h2h)
            # hidden to output
            h2o = self.build_output_layer()
            self.layers.append(h2o)
    
        # initialize feature for each node
        def create_features(self):
            features = torch.arange(self.num_nodes)
            return features
    
        def build_input_layer(self):
            return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
                             activation=F.relu, is_input_layer=True)
    
        def build_hidden_layer(self):
            return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
                             activation=F.relu)
    
        def build_output_layer(self):
            return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
                             activation=partial(F.softmax, dim=1))
        def forward(self, g):
            if self.features is not None:
                g.ndata['id'] = self.features
            for layer in self.layers:
                layer(g)
            return g.ndata.pop('h')
    

    Handle Dataset

    This tutorial uses Institute for Applied Informatics and Formal Description Methods (AIFB) dataset from R-GCN paper.

    # load graph data
    from dgl.contrib.data import load_data
    import numpy as np
    data = load_data(dataset='aifb')
    num_nodes = data.num_nodes
    num_rels = data.num_rels
    num_classes = data.num_classes
    labels = data.labels
    train_idx = data.train_idx
    # split training and validation set
    val_idx = train_idx[:len(train_idx) // 5]
    train_idx = train_idx[len(train_idx) // 5:]
    
    # edge type and normalization factor
    edge_type = torch.from_numpy(data.edge_type)
    edge_norm = torch.from_numpy(data.edge_norm).unsqueeze(1)
    
    labels = torch.from_numpy(labels).view(-1)
    
    # Results:
    Loading dataset aifb
    Graph loaded, frequencies counted.
    Number of nodes:  8285
    Number of relations:  91
    Number of edges:  66371
    4 classes: {'http://www.aifb.uni-karlsruhe.de/Forschungsgruppen/viewForschungsgruppeOWL/id4instance', 'http://www.aifb.uni-karlsruhe.de/Forschungsgruppen/viewForschungsgruppeOWL/id3instance', 'http://www.aifb.uni-karlsruhe.de/Forschungsgruppen/viewForschungsgruppeOWL/id2instance', 'http://www.aifb.uni-karlsruhe.de/Forschungsgruppen/viewForschungsgruppeOWL/id1instance'}
    Loading training set
    Loading test set
    Number of classes:  4
    removing nodes that are more than 3 hops away
    

    Create graph and model

    # configurations
    n_hidden = 16 # number of hidden units
    n_bases = -1 # use number of relations as number of bases
    n_hidden_layers = 0 # use 1 input layer, 1 output layer, no hidden layer
    n_epochs = 25 # epochs to train
    lr = 0.01 # learning rate
    l2norm = 0 # L2 norm coefficient
    
    # create graph
    g = DGLGraph()
    g.add_nodes(num_nodes)
    g.add_edges(data.edge_src, data.edge_dst)
    g.edata.update({'rel_type': edge_type, 'norm': edge_norm})
    
    # create model
    model = Model(len(g),
                  n_hidden,
                  num_classes,
                  num_rels,
                  num_bases=n_bases,
                  num_hidden_layers=n_hidden_layers)
    

    Training loop

    # optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2norm)
    
    print("start training...")
    model.train()
    for epoch in range(n_epochs):
        optimizer.zero_grad()
        logits = model.forward(g)
        loss = F.cross_entropy(logits[train_idx], labels[train_idx])
        loss.backward()
    
        optimizer.step()
    
        train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx])
        train_acc = train_acc.item() / len(train_idx)
        val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
        val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx])
        val_acc = val_acc.item() / len(val_idx)
        print("Epoch {:05d} | ".format(epoch) +
              "Train Accuracy: {:.4f} | Train Loss: {:.4f} | ".format(
                  train_acc, loss.item()) +
              "Validation Accuracy: {:.4f} | Validation loss: {:.4f}".format(
                  val_acc, val_loss.item()))
    
    
    # Results:
    start training...
    Epoch 00000 | Train Accuracy: 0.2054 | Train Loss: 1.3865 | Validation Accuracy: 0.1429 | Validation loss: 1.3868
    Epoch 00001 | Train Accuracy: 0.9286 | Train Loss: 1.3434 | Validation Accuracy: 1.0000 | Validation loss: 1.3567
    Epoch 00002 | Train Accuracy: 0.9286 | Train Loss: 1.2760 | Validation Accuracy: 1.0000 | Validation loss: 1.3106
    Epoch 00003 | Train Accuracy: 0.9286 | Train Loss: 1.1868 | Validation Accuracy: 1.0000 | Validation loss: 1.2472
    Epoch 00004 | Train Accuracy: 0.9375 | Train Loss: 1.0930 | Validation Accuracy: 1.0000 | Validation loss: 1.1717
    Epoch 00005 | Train Accuracy: 0.9464 | Train Loss: 1.0117 | Validation Accuracy: 1.0000 | Validation loss: 1.0924
    Epoch 00006 | Train Accuracy: 0.9464 | Train Loss: 0.9472 | Validation Accuracy: 1.0000 | Validation loss: 1.0163
    Epoch 00007 | Train Accuracy: 0.9464 | Train Loss: 0.8976 | Validation Accuracy: 1.0000 | Validation loss: 0.9499
    Epoch 00008 | Train Accuracy: 0.9554 | Train Loss: 0.8607 | Validation Accuracy: 1.0000 | Validation loss: 0.8968
    Epoch 00009 | Train Accuracy: 0.9554 | Train Loss: 0.8343 | Validation Accuracy: 1.0000 | Validation loss: 0.8576
    ......
    ......
    ......
    Epoch 00015 | Train Accuracy: 0.9732 | Train Loss: 0.7783 | Validation Accuracy: 0.9643 | Validation loss: 0.7865
    Epoch 00016 | Train Accuracy: 0.9821 | Train Loss: 0.7735 | Validation Accuracy: 0.9643 | Validation loss: 0.7854
    Epoch 00017 | Train Accuracy: 0.9821 | Train Loss: 0.7691 | Validation Accuracy: 0.9643 | Validation loss: 0.7851
    Epoch 00018 | Train Accuracy: 0.9821 | Train Loss: 0.7654 | Validation Accuracy: 0.9643 | Validation loss: 0.7855
    Epoch 00019 | Train Accuracy: 0.9821 | Train Loss: 0.7625 | Validation Accuracy: 0.9643 | Validation loss: 0.7864
    Epoch 00020 | Train Accuracy: 0.9821 | Train Loss: 0.7600 | Validation Accuracy: 0.9643 | Validation loss: 0.7876
    Epoch 00021 | Train Accuracy: 0.9821 | Train Loss: 0.7576 | Validation Accuracy: 0.9643 | Validation loss: 0.7893
    Epoch 00022 | Train Accuracy: 0.9821 | Train Loss: 0.7553 | Validation Accuracy: 0.9643 | Validation loss: 0.7913
    Epoch 00023 | Train Accuracy: 1.0000 | Train Loss: 0.7531 | Validation Accuracy: 0.9643 | Validation loss: 0.7937
    Epoch 00024 | Train Accuracy: 1.0000 | Train Loss: 0.7511 | Validation Accuracy: 0.9286 | Validation loss: 0.7965
    

    原文链接:
    https://docs.dgl.ai/tutorials/models/1_gnn/4_rgcn.html

    相关文章

      网友评论

        本文标题:Relational graph convolutional n

        本文链接:https://www.haomeiwen.com/subject/zxjmehtx.html