GraphSage
GraphSage是在论文Inductive Representation Learning on Large Graphs
William中提出的一种归纳式的embedding表示训练方法。
在上一篇所讲的GCN是transductive learning(直推式学习),即通过一个固定的图,直接训练每个节点的embedding。但是在很多图场景下,图节点是实时更新的,所以本文提出了inductive learning(归纳式学习)。不是在一个静态图上训练每个节点的embedding,而是通过训练得到一个由neighbood到embedding的映射关系(aggregator),使得结果可以仅通过邻居关系得到新加入的节点的embedding。
1. Embedding generation (i.e., forward propagation) algorithm

2. Learning the parameters of GraphSAGE
针对无监督学习,训练loss用的是pair-wise,认为有联系的2个节点间embedding应该是接近的,而没有联系的2个节点间embedding应该是远离的。(用内积表达的是余弦距离)

3. Aggregator Architectures
-
Mean aggregator
mean aggregator
-
LSTM aggregator
因为aggregator应该与输入列表顺序无关,而LSTM本身带有顺序,所以每次输入前都需要打乱列表顺序。 -
Pooling aggregator
pooling aggregator
4.问题
在训练数据集中有每个节点的feature information,然后用feature information来训练得到用户的节点,那如果没有feature information怎么办呢?用index来表示吗?
读GraphSage在推荐系统领域应用的论文
Graph Convolutional Neural Networks for Web-Scale Recommender Systems(KDD2018)(PinSage)
这篇论文做融合的总体框架还是GraphSage:从neighbor中抽取一定数量进行融合。但是与Graph有所区别在于不是随机抽取而是importance pooling. 以下说一下这篇论文的主要创新点:
-
importance pooling. 在抽取一定量的邻居做融合时按照权重排序,将最重要的neighbors进行融合。权重按照随机游走走到次数的L1规则化作为打分。并且在融合时
来作为weighted-mean。具体的convolve函数如下:
convolve
-
用Producer-consumer minibatch construction合理分配GPU和CPU的资源。先用CPU生成子图,然后将子图传入GPU进行embedding运算。这样的好处是在训练时不用进行CPU和GPU的通信,提升训练效率。
minibatch
-
论文算法的训练方式还是用的pair-wise的监督学习训练方式。
loss
其中是negative item,
是positive item.训练目标是使query和positive item之间的距离越来越近,query和negative item之间的目标越来越远。为了更有效的训练,本文还提出了一种curriculum learning(课程学习)—— 其中负采样中有一部分是hard negative samples.这些样本与query不相关,但是他们又有一定的相似度(文中对这些sample的采样方式是取与该query的ranking score排名前2000-5000名之间)。刚开始就用这些sample会比较不好训练,所以训练的时候先用random negative samples训练。等参数收敛到一定程度再用这些hard negative samples训练,以取得更好的效果。从易到难就是课程学习。
-
Node Embeddings via MapReduce. 这个技术是为了计算所有节点embedding的时候加快计算速度,减少计算时间。但是具体怎么个减少重复运算方法我没看懂...
Neural Graph Collaborative Filtering(SIGIR2019)(NGCF)
这篇论文的总体框架其实很经典:
- an embedding layer that offers and initialization of user embeddings and item embeddings.首先定义一个
矩阵:
.N是user number,M是item number。
-
multiple embedding propagation layers that refine the embeddings by injecting high- order connectivity relations.
NGCF model architecture
- message
对于a connected user-item pair,定义一个message from
to
:
在文中的message的具体形式为:
message in NICF
- message aggregration
message aggregation
其中,
和上述message中的
保持一致。
-
multi-order propagation
-
Propagation Rule in Matrix Form
写成矩阵形式大大加快了计算速度。(不过其实这有点像是GCN的拓展。)
-
the prediction layer that aggregates the refined embeddings from different propagation layers and outputs the affinity score of a user-item pair.
将propagation layer的每一层输出concat起来作为最后的embedding:
concatenate
dot product
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation(2020)(LightGCN)
这篇文章是对上述NGCF所做的一个改进。文章发现NGCF中的feature transformation(W1, W2)和nonlinear activation()操作不但使训练更困难,并且降低了准确度。这个的主要原因在于:GCN is originally proposed for node classification on attributed graph, where each node has rich attributes as input features; whereas in user-item interaction graph for CF, each node (user or item) is only described by a one-hot ID, which has no concrete semantics besides being an identifier. In such a case, given the ID embedding as the input, performing multiple layers of nonlinear feature transformation —which is the key to the success of modern neural networks — will bring no benefits, but negatively increases the difficulty for model training.
优化后的LightGCN的forward propagation layer:

layer embedding aggregator:和NGCF不同的是:NGCF是之间concat,而LightGCN用的是weighted sum:

matrix form:


注:在forward propagation中并没有加入self connection是因为layer embedding 的weighted sum操作其实包含了self connection。具体证明过程如下:


So inserting self-connection into A and propagating embeddings on it is essentially equivalent to a weighted sum of the embeddings propagated at each LGC layer.
-
Second-Order Embedding Smoothness.
因为本算法较简单,所以可以轻易看出embeding的传播:
coefficient
This coefficient is rather interpretable: the influence of a second- order neighbor v on u is determined by 1) the number of co-interacted items, the more the larger; 2) the popularity of the co-interacted items, the less popularity (i.e., more indicative of user personalized preference) the larger; and 3) the activity of, the less active the larger. Such interpretability well caters for the assumption of CF in measuring user similarity and evidences the reasonability of LightGCN. Due to the symmetric formulation of LightGCN, we can get similar analysis on the item side.
Beyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction(WWW2020)(MGNN-SPred)
这篇论文旨在解决2个问题:
- Existing methods for session-based behavior prediction focus on only utilizing the same type of user behavior for prediction, but ignore the potential of taking other behavior data as auxiliary information.
- item-to-item relations are modeled separately and locally in one behavior sequence, and they lack a principled way to globally encode these relations more effectively.
So MGNN-SPred jointly considers target behavior and auxiliary behavior sequences and explores global item-to-item relations for accurate prediction.
本文算法框架:

构图算法:

Item Representation Learning:
for each node , one-hot representation
;
- convert each of them into a low-dimensional dense vector
by a learnable embedding matrix
.
- GNN forward propagation layer:
- four neighbor set:“target-forward”, “target- backward”, “auxiliary-forward”, and “auxiliary-backward”
target-forward set definition
obtain the representation of this group
combine these four representations of different neighbor groups by sum-pooling
update the representation of the center node v by:
After performingiterations, we take the node representation of the last step as the representation of the corresponding item:
Sequence Representation Learning:

It was found that simple mean-pooling could already achieve comparable performance compared with attention mechanism while retaining low complexity.
It is self-evident that the contributions of auxiliary behavior sequence for the next item prediction are different in different situations.So a gated mechanism is designed to calculate the relative importance weight :

The user preference representation

Model Prediction and Training:
A bi-linear matching scheme is employed by:

得到对每个item
To learn the parameters of the model, a softmax function is applied to nomalize the scores over all item sto get the probability distribution

Backpropagation for neural networks is adopted to optimize the model by minimizing the cross-entropy loss of the predicted probability distribution

where denotes the one-hot representation of the ground truth.
Memory Augmented Graph Neural Networks for Sequential Recommendation(AAAI 2020)(MA-GNN)
这篇论文是解决sequential recommendation.主要贡献点在于:
- model the short-term user interests by GNN;
- capture the long-term user interests by memory units;
- Interest fusion by gate layer;
-
Item co-occurrence modeling.
论文算法的总体框图:
structure
model the short-term user interests by GNN
首先在一个序列中用sliding window strategy取出子序列,然后对每个子序列如下图所示对item添加边

Short-term Interest Aggregation:一个2层的GCN.

Long-term Interest Modeling
用external memory units去存储随时间变化的用户长期兴趣。用户随时间接触的序列为:.
则首先用multi-dimensional attention model生成query embedding:

其中是sinusoidal positional encoding function that maps the item positions into position embeddings.
然后对memory unit作操作:

where
Interest Fusion

The superscript C denotes the fusion of long- and short-term interests.
Item Co-occurrence Modeling
表示短期内接触的item与所要求的item的关系远近

Prediction and Training


网友评论