美文网首页数据分析案例
推荐系统(四):word2vec在推荐系统的应用

推荐系统(四):word2vec在推荐系统的应用

作者: fromeast | 来源:发表于2020-02-13 16:22 被阅读0次

    一、Word2Vec算法

    Word2Vec简单讲就是通过学习文本然后用词向量的方式表征词的语义信息,即通过Embedding把原先词所在空间映射到一个新的空间中去,使得语义上相似的单词在该空间内距离相近。以传统神经网络为基础的神经概率语言模型,缺点主要是计算量太大,集中体现在:隐层和输出层之间的矩阵运算和输出层上的 Softmax归一化运算上。因此Word2Vec就是针对这两点来优化神经概率语言模型的。 中两个重要的模型是: CBOW模型和 Skip-gram模型。

    1. CBOW模型

    CBOW(continue bag-of-words) 包括三层结构:输入层、投影层和输出层。


    输入层:包含2c个词的词向量。
    投影层:对输入的2c个词向量进行累加操作。
    输出层:输出层对应一颗哈夫曼(Huffman)树,它是以语料中出现过的词当叶子节点,以各词在语料库中出现的次数当权值构造而成。在这颗树中,叶子结点共个,分别对应词典中的词,非叶结点个。

    由的思想,对于词典中的的任意词,哈夫曼树中必然存在唯一一条从根节点到词对应叶子节点的路径。路径中包含节点的个数为,则路径上存在个分支,将每个分支看作一次二分类,那么每一次分类就对应一个概率,最后这些概率连乘得到,即

    其中d_{j}^{w}表示路径p^w中第j个节点对应的编码(根节点不对应编码);\theta_{j}^{w}表示路径p^w中第j个非叶子节点对应的向量;X_{w}表示 \text { Context }(w)中所有词向量的叠加。

    \begin{aligned} p\left(d_{j}^{w} | X_{w} ; \theta_{j-1}^{w}\right) &=\left\{\begin{array}{ll}{\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right),} & {\text { if } d_{j}^{w}=0} \\ {1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right),} & {\text { otherwise }} \end{array}\\ {} {=\left[\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]^{1-d_{j}^{w}} \cdot\left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]^{d_{j}^{w}}}\right.\end{aligned}

    其中\sigma(x) = \frac{1}{1+e^{-x}}。通过对数极大似然化处理可得CBOW模型的目标函数为:
    \begin{aligned} \mathcal{L} &=\sum_{w \in \mathcal{D}} \log \prod_{j=2}^{l^{w}}\left(\left[\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]^{1-d_{j}^{w}} \cdot\left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]^{d_{j}^{w}}\right) \\ &=\sum_{w \in \mathcal{D}} \sum_{j=2}^{l^{w}}\left(\left(1-d_{j}^{w}\right) \cdot \log \left[\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]+d_{j}^{w} \cdot \log \left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]\right) \\ &=\sum_{w \in \mathcal{D}} \sum_{j=2}^{l^{w}} \Phi\left(\theta_{j-1}^{w}, X_{w}\right) \end{aligned}

    word2vec极大化目标函数使用的算法是随机梯度上升法,首先考虑\Phi\left(\theta_{j-1}^{w}, X_{w}\right)\theta_{j-1}^{w}的梯度计算:
    \begin{aligned} \frac{\partial \Phi\left(\theta_{j-1}^{w}, X_{w}\right)}{\partial \theta_{j-1}^{w}} &=\frac{\partial}{\theta_{j-1}^{w}}\left(\left(1-d_{j}^{w}\right) \cdot \log \left[\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]+d_{j}^{w} \cdot \log \left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]\right) \\ &=\left(1-d_{j}^{w}\right)\left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right] X_{w}-d_{j}^{w} \sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right) X_{w} \\ &=\left(\left(1-d_{j}^{w}\right)\left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]-d_{j}^{w} \sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right) X_{w} \\ &=\left(1-d_{j}^{w}-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right) X_{w} \end{aligned}

    \theta_{j-1}^{w}的更新公式为:
    \theta_{j-1}^{w}:=\theta_{j-1}^{w}+\eta\left(1-d_{j}^{w}-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right) X_{w}

    然后考虑\Phi\left(\theta_{j-1}^{w}, X_{w}\right)X_{w}的梯度计算:
    \begin{aligned} \frac{\partial \Phi\left(\theta_{j-1}^{w}, X_{w}\right)}{\partial X_{w}} &=\frac{\partial}{X_{w}}\left(\left(1-d_{j}^{w}\right) \cdot \log \left[\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]+d_{j}^{w} \cdot \log \left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]\right) \\ &=\left(1-d_{j}^{w}\right)\left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right] \theta_{j-1}^{w}-d_{j}^{w} \sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right) \theta_{j-1}^{w} \\ &=\left(\left(1-d_{j}^{w}\right)\left[1-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right]-d_{j}^{w} \sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right) \theta_{j-1}^{w} \\ &=\left(1-d_{j}^{w}-\sigma\left(X_{w}^{T} \theta_{j-1}^{w}\right)\right) \theta_{j-1}^{w} \end{aligned}

    观察到\Phi\left(\theta_{j-1}^{w}, X_{w}\right)\theta_{j-1}^{w}X_{w}具有对称性,word2vec直接取下式方式来更新v(\widetilde{w})
    v(\widetilde{w}):=v(\widetilde{w})+\eta \sum_{j=2}^{l^{w}} \frac{\partial \Phi\left(\theta_{j-1}^{w}, X_{w}\right)}{\partial X_{w}}, \widetilde{w} \in \text { Context }(w)

    2. Skip-gram模型

    Skip-gram模型也包括输入层、投
    影层和输出层。


    输入层:只含有当前样本的中心词的词向量;
    投影层:由于为恒等投影,因此该层可有可无;
    输出层:也为一棵哈夫曼树。

    在模型中已知当前词,需要对其上下文中的词进行预测,关键是条件概率函数,即:

    p(\text {Context}(w) | w)=\prod_{u \in C \text {ontext}(w)} p(u | w)

    同样由Hierarchical Softmax的思想,可得:
    \begin{aligned} p(u | w) &=\prod_{j=2}^{l^{w}} p\left(d_{j}^{u} | v(w) ; \theta_{j-1}^{u}\right) \\ &=\prod_{j=2}^{l^{u}}\left[\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]^{1-d_{j}^{u}} \cdot\left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right]^{d_{j}^{u}}\right. \end{aligned}

    通过极大似然化处理可得Skip-gram模型的目标函数为:
    \begin{aligned} \mathcal{L} &=\sum_{w \in \mathcal{D}} \log \prod_{u \in C o n t e x t} \prod_{j=2}^{l^{u}}\left(\left[\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]^{1-d_{j}^{u}} \cdot\left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right]^{d_{j}^{u}}\right)\right.\\ &=\sum_{w \in \mathcal{D}} \sum_{u \in C o n t e x t} \sum_{j=2}^{l^{u}}\left(\left(1-d_{j}^{u}\right) \cdot \log \left[\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]+d_{j}^{u} \cdot \log \left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]\right) \\ &=\sum_{w \in \mathcal{D}} \sum_{u \in \text {Context}(w)} \sum_{j=2}^{l^{u}} \mathcal{O}\left(\theta_{j-1}^{u}, v(w)\right) \end{aligned}

    考虑\mathcal{O}\left(\theta_{j-1}^{u}, v(w)\right)\theta_{j-1}^{u}的梯度:
    \begin{aligned} \frac{\partial \mathcal{O}\left(\theta_{j-1}^{u}, v(w)\right)}{\partial \theta_{j-1}^{u}} &\left.=\frac{\partial}{\partial \theta_{j-1}^{u}}\left(\left(1-d_{j}^{u}\right) \cdot \log \left[\sigma(w)^{T} \theta_{j-1}^{u}\right)\right]+d_{j}^{u} \cdot \log \left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]\right) \\ &=\left(1-d_{j-1}^{u}\right)\left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right] v(w)-d_{j}^{u} \sigma\left(v(w)^{T} \theta_{j-1}^{u}\right) v(w) \\ &=\left(\left(1-d_{j}^{u}\right)\left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]-d_{j}^{u} \sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right) v(w) \\ &=\left(1-d_{j}^{u}-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right) v(w) \end{aligned}

    \theta_{j-1}^{u}的更新公式为:
    \theta_{j-1}^{u}:=\theta_{j-1}^{u}+\eta\left(1-d_{j}^{u}-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right) v(w)

    再考虑\mathcal{O}\left(\theta_{j-1}^{u}, v(w)\right)v(w)的梯度:
    \begin{aligned} \frac{\partial \mathcal{O}\left(\theta_{j-1}^{u}, v(w)\right)}{\partial v(w)} &=\frac{\partial}{\partial v(w)}\left(\left(1-d_{j}^{u}\right) \cdot \log \left[\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]+d_{j}^{u} \cdot \log \left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]\right) \\ &=\left(1-d_{j}^{u}\right)\left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right] \theta_{j-1}^{u}-d_{j}^{u} \sigma\left(v(w)^{T} \theta_{j-1}^{u}\right) \theta_{j-1}^{u} \\ &=\left(\left(1-d_{j}^{u}\right)\left[1-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right]-d_{j}^{u} \sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right) \theta_{j-1}^{u} \\ &=\left(1-d_{j}^{u}-\sigma\left(v(w)^{T} \theta_{j-1}^{u}\right)\right) \theta_{j-1}^{u} \end{aligned}

    v(w)更新公式为:
    v(w):=v(w)+\eta \sum_{u \in \text { Context }(w)} \sum_{j=2}^{l^{u}} \frac{\partial \mathcal{O}\left(\theta_{j-1}^{u}, v(w)\right)}{\partial v(w)}

    二、算法实现

    本次所用数据来自一家英国在线零售商店2010年1月12日至2011年12月9的全部交易数据,数据规模为54908\times 8。下载地址,https://archive.ics.uci.edu/ml/datasets/Online+Retail ,字段描述如下:

    • InvoiceNo:发票编号。 定类数据,为每个事务唯一分配的6位整数。 如果此代码以字母'c'开头,则表示取消。
    • StockCode:产品(项目)代码。 定类数据,为每个不同的产品唯一分配的5位整数。
    • Description:产品(项目)名称。定类数据。
    • Quantity:每笔交易的每件产品(项目)的数量。 数字。
    • InvoiceDate:Invoice日期和时间。 数字,生成每个事务的日期和时间。
    • UnitPrice:单价。 数字,英镑单位产品价格。
    • CustomerID:客户编号。 定类数据,为每个客户唯一分配的5位整数。
    • Country:国家名称。 定类数据,每个客户所在国家/地区的名称。
    数据集举例
    1. 库函数、数据读入以及删除缺失值
    import numpy as np
    import pandas as pd
    import random
    from tqdm import tqdm
    from gensim.models import Word2Vec
    import matplotlib.pyplot as plt
    import umap
    
    data = pd.read_excel('Online Retail.xlsx')
    data.isnull().sum()
    data.dropna(inplace=True)
    
    2. 找到用户的个数及其列表,并随机打乱,可知用户共4372个
    data['StockCode'] = data['StockCode'].astype(str)
    
    customers = data['CustomerID'].unique().tolist()
    random.shuffle(customers)
    
    train_customers = [customers[i] for i in range(round(0.9*len(customers)))]
    train_data = data[data['CustomerID'].isin(train_customers)]
    validation_data = data[~data['CustomerID'].isin(train_customers)]
    
    3. 划分训练集和验证集并加载
    train_customers = [customers[i] for i in range(round(0.9*len(customers)))]
    train_data = data[data['CustomerID'].isin(train_customers)]
    validation_data = data[~data['CustomerID'].isin(train_customers)]
    
    train_purchases = []  #训练集用户购买记录
    for i in tqdm(train_customers):
        temp = train_data[train_data['CustomerID']==i]['StockCode'].tolist()
        train_purchases.append(temp)
        
    val_purchases = []  #验证集用户购买记录
    for i in tqdm(validation_data['CustomerID'].unique()):
        temp = validation_data[validation_data['CustomerID']==i]['StockCode'].tolist()
        val_purchases.append(temp)
    
    100%|██████████| 3935/3935 [00:04<00:00, 974.74it/s]
    100%|██████████| 437/437 [00:00<00:00, 1613.25it/s]
    
    4. word2vec模型加载及初始化
    model = Word2Vec(window = 10,sg = 1, hs = 0, negative = 10, alpha = 0.03, min_alpha = 0.0007, seed = 14)
    model.build_vocab(train_purchases,progress_per = 200)
    model.train(train_purchases, total_examples = model.corpus_count, epochs = 10, report_delay = 1)
    
    model.init_sims(replace=True)
    X = model[model.wv.vocab]
    
    5. 模型向量可视化
    cluster_embedding = umap.UMAP(n_neighbors=30, min_dist=0.0, n_components=2, random_state=42).fit_transform(X)
    plt.figure(1)
    plt.scatter(cluster_embedding[:,0], cluster_embedding[:,1], s=3, cmap='Spectral')
    
    6. 创建产品股票代码及描述的字典并删除重复项
    products = train_data[["StockCode", "Description"]]
    products.drop_duplicates(inplace=True, subset='StockCode', keep='last')
    products_dict = products.groupby('StockCode')['Description'].apply(list).to_dict()
    
    In [8]: products_dict['21931']
    Out[8]: ['JUMBO STORAGE BAG SUKI']
    
    7. 以产品向量作为输入,并返回6个相似产品
    def similar_products(v,n=6):
        ms = model.wv.similar_by_vector(v,topn = n+1)[1:]
        new_ms = []
        for j in ms:
            pair = (products_dict[j[0]][0],j[1])
            new_ms.append(pair)
        return new_ms
    similar_products(model['84406B'])
    
    In [9]: similar_products(model['21931'])
    Out[9]: 
    [('JUMBO BAG STRAWBERRY', 0.8357644081115723),
     ('JUMBO BAG OWLS', 0.8068020343780518),
     ('JUMBO  BAG BAROQUE BLACK WHITE', 0.7999265193939209),
     ('JUMBO BAG RED RETROSPOT', 0.7874696254730225),
     ('JUMBO BAG PINK POLKADOT', 0.7594423294067383),
     ('JUMBO STORAGE BAG SKULLS', 0.758986234664917)]
    
    8. 基于多次购买的平均值推荐相似产品
    def aggregate_vectors(products):
        product_vec = []
        for i in products:
            product_vec.append(model[i])
        return np.mean(product_vec,axis=0)
        
    aggregate_vectors(val_purchases[1]).shape
    similar_products(aggregate_vectors(val_purchases[1]))
    similar_products(aggregate_vectors(val_purchases[1][-10:]))
    
    In [13]: similar_products(aggregate_vectors(val_purchases[1]))
    Out[13]: 
    [('LUNCH BAG RED RETROSPOT', 0.6403661370277405),
     ('ALARM CLOCK BAKELIKE RED ', 0.638660728931427),
     ('RED RETROSPOT PICNIC BAG', 0.6361196637153625),
     ('JUMBO BAG RED RETROSPOT', 0.6360040903091431),
     ('SET/5 RED RETROSPOT LID GLASS BOWLS', 0.6345535516738892),
     ('ALARM CLOCK BAKELIKE PINK', 0.6296969056129456)]
    
    In [14]: similar_products(aggregate_vectors(val_purchases[1][-10:]))
    Out[14]: 
    [('ROUND SNACK BOXES SET OF 4 FRUITS ', 0.7854548692703247),
     ('LUNCH BOX WITH CUTLERY RETROSPOT ', 0.6739486455917358),
     ('SET OF 3 BUTTERFLY COOKIE CUTTERS', 0.6696499586105347),
     ('SET OF 3 REGENCY CAKE TINS', 0.6598889827728271),
     ('PICNIC BOXES SET OF 3 RETROSPOT ', 0.6580283641815186),
     ('POSTAGE', 0.6528887748718262)]
    

    参考资料

    [1]. 推荐系统与深度学习. 黄昕等. 清华大学出版社. 2019.
    [2]. 美团机器学习实践. 美团算法团队. 人民邮电出版社. 2018.
    [3]. 推荐系统算法实践. 黄美灵. 电子工业出版社. 2019.
    [4]. 推荐系统算法. 项亮. 人民邮电出版社. 2012.
    [5]. https://github.com/smrutiranjan097/Building-a-Recommendation-System-using-Word2vec
    [6]. https://github.com/shoreyarchit/Movie-Recommendation-System
    [7]. https://github.com/yhangf/ML-NOTE

    夜雨翦春韭,新炊间黄粱。——杜甫《赠卫八处士》

    相关文章

      网友评论

        本文标题:推荐系统(四):word2vec在推荐系统的应用

        本文链接:https://www.haomeiwen.com/subject/gjrinctx.html