论文阅读：《Wasserstein Distance Guide

作者: LiBiscuit | 来源:发表于2021-06-21 15:46 被阅读0次

真真真好久没有更新论文阅读了（论文也在库存状态……）
一下子就六月底了这个月真的好快
但希望这个月的努力会有收获！
今天的论文阅读是针对域适应的一篇

论文名称：
《Wasserstein Distance Guided Representation Learning for Domain Adaptation》
论文地址：https://arxiv.org/abs/1707.01217v4
论文代码：https://github.com/RockySJ/WDGRL
论文参考阅读：https://blog.csdn.net/qq_41076797/article/details/116942752

Background

1.Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative inprediction.
域适应的目标是利用从具有不同但相关数据分布的源领域提取的知识，在目标领域上推广。域适应的一个解决方案是学习领域不变的特征表示，而学习的表示在预测中也应该具有鉴别性.
2.To effectively transfer a classifier across different domains, different methods have been proposed, including instance reweightingsubsampling feature mappingand weight regularization
在这些方法中，特征映射最近取得了巨大的成功，它将来自不同域的数据投影到一个特征表示是域不变的公共潜在空间中。
3.On the other hand, generative adversarial nets (GANs)are heavily studied during recent years, which play a minimax game between two adversarial networks: the discriminator is trained to distinguish real data from the generated data, while the generator learns to generate high-quality data to fool the discriminator
然而，当域分类器网络能够完美地区分目标表示和源表示时，就会出现梯度消失问题。一个更合理的解决方案是用瓦瑟斯坦距离代替域差异测度，即使两个分布距离遥远，也能提供更稳定的梯度

Related Works

这篇文章对于域适应的方法分类还是比较完整的，故，这边记录一下。
i). Instance-based methods, which reweight/subsample the source samples to match the distribution of the target domain, thus training on the reweighted source samples guarantees classifiers with transferability
基于实例的方法，对源样本进行重加权/子样本，以匹配目标域的分布
ii). Parameter-based methods, which transfer knowledge through shared or regularized parameters ofsource and target domain learners, or by combining mul�tiple reweighted source learners to form an improved target learner
基于参数的方法，通过源域和目标域学习者的共享或正则化参数来传递知识
. iii). feature-based, which can be further categorized into two groups
Asymmetric feature-based methods transform the features of one domain to more closely match another domain
symmetric feature-based methods map different domains to a common latent space where the feature distributions are close.
基于非对称特征的方法将一个域的特征转换为更接近另一个域
基于对称特征的方法将不同的域映射到一个特征分布接近的共同潜在空间。

Work

In this paper, we propose a domain invariant representation learning approach to reduce domain discrepancy for domain adaptation, namely Wasserstein Distance Guided Representation Learning (WDGRL), inspired by recently proposed Wasserstein GAN
本文巧妙地把WGAN的度量用在了domain adaptation上，提出WGDRL度量
（值得注意，这里的WDGRL中的GRL不是DANN里面的GRL，注意区分）
Our WDGRL differs from previous adversarial methods:
i). WDGRL adopts an iterative ad versarial training strategy
ii). WDGRL adopts Wasserstein distance as the adversarial loss which has gradient superiority
我们的WDGRL不同于以前的对抗性方法：i)。WDGRL采用了一种迭代的对抗性训练策略，ii)。WDGRL采用Wasserstein distance作为具有梯度优势的对抗性损失

Model

补充知识：Wasserstein Metric
The Wasserstein metric is a distance measure between probability distributions on a given metric space (M, ρ), where ρ(x, y) is a distance function for two instances x and y in the set M. The p-th Wasserstein distance between two Borel probability measures P and Q is defined as

可参考：https://blog.csdn.net/zkq_1986/article/details/84937388
进入正题
WDGRL trains a domain critic network to estimate the empirical Wasserstein distance between the source and target feature representations. The feature extractor network
will then be optimized to minimize the estimated Wasserstein distance in an adversarial manner. By iterative adversarial training, we finally learn feature representations invariant to the covariate shift between domains.
WDGRL训练一个领域判别网络来估计源和目标特征表示之间的Wasserstein distance。特征提取器网络将被优化，以对抗的方式最小化估计的Wasserstein distance。
通过迭代对抗训练，我们最终学习了域之间协变量变化不变的特征表示
WDGRL可以很容易地在现有的领域中被采用。完整模型如下所示：
源域和目标域数据同时经过相同的特征提取网络进行特征提取，然后先经过Domain Critic Network来调整（Domain Critic Network的）参数，使在其满足Lipschitz条件下，最大化目标式，这样Domain Critic Network生成的损失才具有可信度。最后，在通过分类器Discriminator进行分类，得到相应的标签分类损失。
分类损失为交叉熵损失：
为避免梯度消失或爆炸性问题。对域批评者参数θw实施梯度惩罚
其中，惩罚梯度的特征表示不仅在源和目标表示上定义，而且在源和目标表示对之间沿直线上的随机点上定义。所以我们可以通过解决这个问题来估计Wasserstein distance

总目标函数：