美文网首页
GCN相关代码

GCN相关代码

作者: Lucas_null | 来源:发表于2019-11-30 22:14 被阅读0次

Reference: Neural Graph Collaborative Filtering. In SIGIR 2019.
使用用户-物品隐式交互数据来构建用户-物品二分图,然后使用图卷积的方法来更新图中每一个节点的embedding,使得每一个节点中都包含着高阶邻居的信息,在推荐的场景下,这样的卷积操作相当于将协同过滤信号编码进了每一个节点的embedding函数中,如果堆叠多次这样的卷积操作,就可以捕获更高阶的交互关系。

  1. 利用交互数据来生成评分矩阵R
R = sp.dok_matrix((n_users, n_items), dtype=np.float32)
with open(train_file) as f_train:
    for l in f_train.readline():
        items = [int(i) for i in l.strip('\n').split(' ')]
        uid, train_items = items[0], itms[1:]
        for i in train_items:
            R[uid, i] = 1
  1. 将评分矩阵转化为邻接矩阵
def create_adj_matrix(R):
    adj_mat = sp.dok_matrix((n_users + n_items, n_users + n_items), dtype=np.float32)
    adj_mat = adj_mat.tolil()
    R = R.tolil()
    adj_mat[:n_users, n_users:] = R
    adj_mat[n_users:, :n_users] = R.T
    adj_mat = adj_mat.todok()
    print('already create adjacency matrix',  adj_mat.shape)
  1. 将邻接矩阵进行标准化
    添加自环(第3代GCN): adj_mat = adj_mat + sp.eye(adj_mat.shape[0])
    不添加自环:adj_mat

\color{teal}{\leftrightarrow将邻居节点聚集的结果作为当前节点的embedding表达}
e_u^{(1)} = \sum_{i \in N_u}e_i^{(0)}

可以使用L1标准化 将邻接矩阵乘度矩阵的逆(适用于单边标准化-第1代GCN),也可以使用开根号的方式,将度矩阵上每一个元素开根号之后求逆(适用于双边标准化-第2代GCN,效果更好).

def normalize_adj_single(adj):
    rowsum = np.array(adj.sum(1))

    d_inv = np.power(rowsum, -1).flatten()
    d_inv[np.isinf(d_inv)] = 0.
    d_mat_inv = sp.diags(d_inv)

     norm_adj = d_mat_inv.dot(adj)
         
     print('generate single-normalized adjacency matrix.')
     return norm_adj.tocoo()

def normalize_adj_symetric(adj):
    adj = sp.coo_matrix(adj)
    rowsum = np.array(adj.sum(1))
    d_inv_sqrt = np.power(rowsum, -0.5).flatten()
    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()
    
mean_adj_1 = normalize_adj_single(adj_mat) # 单边标准化
norm_adj_1 = normalize_adj_symteric(adj_mat) # 双边标准化
mean_adj_2 = normalize_adj_single(adj_mat + sp.eye(adj_mat.shape[0])) # 单边标准化+自环
norm_adj_2 = normalize_adj_symteric(adj_mat + sp.eye(adj_mat.shape[0])) # 双边标准化+自环
  1. 一般较大的邻接矩阵无法直接和别的矩阵相乘(内存不足),需要将邻接矩阵切片后每部分单独运算,最后将结果组合在一起。
def _split_A_hat( X):
    A_fold_hat = []

    fold_len = (n_users + n_items) // n_fold
    for i_fold in range(n_fold):
        start = i_fold * fold_len
        if i_fold == n_fold - 1:
            end = n_users + n_items
        else:
            end = (i_fold + 1) * fold_len

        A_fold_hat.append(_convert_sp_mat_to_sp_tensor(X[start:end]))
    return A_fold_hat

# 将稀疏矩阵转化为tensor格式
def _convert_sp_mat_to_sp_tensor(self, X):
   coo = X.tocoo().astype(np.float32)
   indices = np.mat([coo.row, coo.col]).transpose()
   return tf.SparseTensor(indices, coo.data, coo.shape)
  1. 利用GCN来更新节点的embedding
def create_gcn_embed():
    A_fold_hat = _split_A_hat(adj_mat)
    node_embeddings = tf.concat([user_embeddings, item_embeddings],  axis=0)
    all_embeddings = [node_embedding]
    for k in range(n_layers):
        temp_emb = []
        for f in range(n_fold):
            # 稀疏tensor和稠密tensor相乘的 方法
            temp_embed.append(tf.sparse_tensor_dense_matmul(A_fold_hat[f], eu_embeddings))
        sum_embeddings = tf.concat(temp_embed, 0)
        node_embeddings = sum_embeddings
        all_embeddings += [node_embeddings]
    all_embeddings = tf.concat(all_embeddings, 1)
    user_embeddings, item_embeddings = tf.split(all_embeddings, [n_users, n_items], 0)
    return user_embeddings, item_embeddings

相关文章

网友评论

      本文标题:GCN相关代码

      本文链接:https://www.haomeiwen.com/subject/kacawctx.html