基于图网络的神经网络学习模型已经发展好几年了,近几年对其关注也越来越多,正是因为它的强大表征能力,而且图结构更接近我们真实的世界表示,这也是知识图谱成为现在知识表示的热门原因。
这篇文章对GCN和图网络在NLP处理的论文introduction总结还是挺棒的。
1 图网络基本介绍
基于图能够更加灵活表征节点与节点之间的关系
only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classi- fication.
本文构建了个基于words-documents的网络图(nodes由:words和documents表示)
we build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the cor- pus.
Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents
图网络的表征能力
Recently, a new research direction called graph neural networks or graph embeddings has attracted wide atten- tion (Battaglia et al. 2018; Cai, Zheng, and Chang 2018). Graph neural networks have been effective at tasks thought to have rich relational structure and can preserve global structure information of a graph in graph embeddings.
We construct a single large graph from an entire corpus, which contains words and documents as nodes.
We model the graph with a Graph Convolutional Network (GCN) (Kipf and Welling 2017), a simple and effective graph neural network that captures high order neighborhoods information. The edge between two word nodes is built by word co-occurrence information and the edge between a word node and document node is built using word frequency and word’s document frequency. We then turn text classification problem into a node classifica- tion problem. The method can achieve strong classification performances with a small proportion of labeled documents and learn interpretable word and document node embed- dings. Our source code is available at https://github. com/yao8839836/text_gcn.
在详细描述本文的图网络之前,我们先看下GCN是如何表征和学习的。
建议可以参考这篇文章:https://mp.weixin.qq.com/s/sg9O761F0KHAmCPOfMW_kQ
首先图网络和我们图像的二维矩阵/三维矩阵有很大的区别,它不是一个规则的网络结构,传统的CNN和RNN无法直接使用。但是可以利用邻接矩阵来表征节点之间的关系。
假设我们有一个图G=(V, E),且A表示这个图的邻接矩阵。
我们希望训练每个图节点的embedding,图的各层传播function可以表示如下:
其中\sigma表示激活函数,A表示邻接矩阵,L表示每层节点的特征编码,
其中n表示图顶点的个数=|V|, m表示各层每个节点的embedding维度。
Wj表示第j层的权重矩阵,其维度就是链接j层和j+1层的特征。这里W是共享的,同一层各个节点共享。
对于第一层,我们可以如下表示
但是,上面这个表征方式存在些缺陷
1、节点AX无法表征自身的特征,如果是无环图就无法表征。
2、如果节点的出度和入度不均衡,也会影响 表征,所以希望能够归一化
对应上述,我们做如下改变邻接矩阵
1、A=A+I
2、A=D(1/2)AD(1/2), 这里D表示图的度矩阵
所以A=D(1/2)*(A+I)*D(1/2)
GCN经典模型
[Kipf and Welling 2017] Kipf, T. N., and Welling, M. 2017. Semi-supervised classification with graph convolutional net- works. In ICLR
2 文章介绍
这篇文章是GCN在文本分类的一个应用,其中比较有意思的几个思想是:
- 同时构建了word-document的节点的图
- 边的权重利用TF-IDF和PMI来表征。
首先我们看下本文的整体结构
image本文对网络结构的描述我就不一一翻译,和GCN很像,下面这段描述把GCN的网络结构和每层的学习表征出来了。
image
文章中定义了两种顶点,对应也有两种边
word-document: 如果word 在document中出现,则建立边关系,边上的权重(邻接矩阵A的系数)则是TF-IDF。
word-word:采用window sliding滑窗,在同一个window内的word-word构建一条边,边上的权重则采用PMI,并且只有正的词权重才构建,否则不构建
本文采用2层的图网络,最后是一个softmax分类,如z表示。这里同样采用交叉熵作为优化目标,如L所示,其中yD表示有label的文章集合,F表示对应的多分类的类标。
image按照神经网络这几年的趋势,一般是越深网络的表征能力越强,但很大依赖更新机制。
这篇文章用基础的GCN表征,并不是越多层越好
imageIn our preliminary experiment. We found that a two-layer GCN performs better than a one- layer GCN, while more layers did not improve the perfor- mances. This is similar to results in (Kipf and Welling 2017) and (Li, Han, and Wu 2018).
但是有个致命缺点,就是测试的文本集合必须在training的时候就加入到网络中学习 对应的embedding,即网络无法动态的。
However, a major limitation of this study is that the GCN model is inherently transductive, in which test document nodes (without labels) are included in GCN training. Thus Text GCN could not quickly generate embeddings and make prediction for unseen test documents. Possible solutions to the problem are introducing inductive
这就需要参考另一篇文章《Inductive Representation Learning on Large Graphs》,这个我们下次介绍吧。
网友评论