美文网首页
Knowledge Distill via NST

Knowledge Distill via NST

作者: 信步闲庭v | 来源:发表于2017-10-19 11:46 被阅读102次

    Approach

    Knowledge Transfer (KT), which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the popular solutions. In this paper, we propose a novel knowledge transfer method by treating it as a distribution matching problem. Particularly, we match the distributions of neuron selectivity patterns between teacher and student networks. To achieve this goal, we devise a new KT loss function by minimizing the Maximum Mean Discrepancy (MMD) metric between these distributions.

    • Notations


    • Maximum Mean Discrepancy

    • Neuron Selectivity Transfer
      The regions with high activations from a neuron may share some task related similarities, in order to capture these similarities, there should be also neurons mimic these activation patterns in student networks.

    Considering the activation of each spatial position as one feature, then the flattened activation map of each filter is an sample the space of neuron selectivities of dimension HW . This sample distribution reflects how a CNN interpret an input image: where does the CNN focus on?

    Then we can define Neuron Selectivity Transfer loss as:

    linear kernel

    Experiment


    References:
    Like What You Like: Knowledge Distill via Neuron Selectivity Transfer,Naiyan Wang,2017, ArXiv

    相关文章

      网友评论

          本文标题:Knowledge Distill via NST

          本文链接:https://www.haomeiwen.com/subject/dakuuxtx.html