Triplet Loss及tensorflow实现

作者: fighting41love | 来源:发表于2018-04-11 20:16 被阅读111次

Triplet Loss及tensorflow实现
字节CV面经
Triplet-loss
Beyond triplet loss: a deep quad
【损失函数合集】超详细的语义分割中的Loss大盘点
Triplet Network, Triplet Loss及其t
contrastive loss与triplet loss
loss函数之triplet loss
Tensorflow 的NCE-Loss的实现和word2vec
（八）sequence to sequence —2

本文译自Olivier Moindrot的[blog](Triplet Loss and Online Triplet Mining in TensorFlow)，英语好的可移步至其博客。

我们在之前的文章里介绍了[siamese network以及triplet network](Siamese network 孪生神经网络--一个简单神奇的结构)的基本概念，本文将介绍一下triplet network中triplet loss一些有趣的地方。

前言

在人脸识别领域，triplet loss通常用来学习人脸的向量表示。如果您对triplet loss不太了解推荐观看Andrew Ng在Coursera上的deep learning specialization。

Triplet loss难于实现，本文将介绍triplet loss的定义以及triplet训练时的策略。为什么要有训练策略？所有的triplet组合太多了，都要训练太inefficient，所以要挑一些比较好的triplet进行训练，高效&效果好。

Triplet loss 和 triplet mining

为什么不用softmax，而使用triplet loss?

Triplet loss最早被用在人脸识别任务上，《FaceNet: A Unified Embedding for Face Recognition》 by Google。Google的研究人员提出了通过online triplet mining的方式训练处人脸的新向量表示。接下来我们会详细讨论。

在有监督的机器学习领域，通常有固定的类别，这时就可以使用基于softmax的交叉熵损失函数进行训练。但有时，类别是一个变量，此时使用triplet loss就能解决问题。在人脸识别，Quora question pair任务中，triplet loss的优势在于细节区分，即当两个输入相似时，triplet loss能够更好地对细节进行建模，相当于加入了两个输入差异性差异的度量，学习到输入的更好表示，从而在上述两个任务中有出色的表现。当然，triplet loss的缺点在于其收敛速度慢，有时不收敛。

Triplet loss的motivation是要让属于同一个人的人脸尽可能地“近”

（在embedding空间里），而与其他人脸尽可能地“远”。

Triplet loss 定义

Triplet loss 在 positive faces (Obama) 和 negative face (Macron)上的示意图

triplet loss的目标是:

两个具有同样标签的样本，他们在新的编码空间里距离很近。

两个具有不同标签的样本，他们在新的编码空间里距离很远。

进一步，我们希望两个positive examples和一个negative example中，negative example与positive example的距离，大于positive examples之间的距离，或者大于某一个阈值：margin。

triplet loss定义在下面三元组概念之上：

an anchor(基准正例)
a positive of the same class as the anchor （正例）
a negative of a different class （负例）

对于（a,p,n）这个triplet(三元组)，其triplet loss就可以写作：

[图片上传失败...(image-c02bb3-1523449975636)]

这时可以通过最小化上述损失函数，a与p之间的距离d(a,p)=0，而a与n之间的距离d(a,n)大于d(a,p)+margin。当negative example很好识别时，上述损失函数为0，否则是一个比较大的值。

Triplet mining

基于triplet loss的定义，可以将triplet(三元组)分为三类：

easy triplets(简单三元组): triplet对应的损失为0的三元组，形式化定义为$d(a,n)>d(a,p)+margin$。

hard triplets（困难三元组）: negative example 与anchor距离小于anchor与positive example的距离，形式化定义为$d(a,n)<d(a,p)$。

semi-hard triplets（一般三元组）: negative example 与anchor距离大于anchor与positive example的距离，但还不至于使得loss为0，即$d(a,p)<d(a,n)<d(a,p)+margin$。

上述三种概念都是基于negative example与anchor和positive距离定义的。类似的，可以根据上述定义将negative examples分为3类：hard negatives, easy negatives, semi-hard negatives。如下图所示，这个图构建了编码空间中三种negative examples与anchor和positive example之间的距离关系。

三种negative examples与anchor和positive example之间的距离关系

如何选择triplet或者negative examples，对模型的效率有很大影响。在上述Facenet论文中，采用了随机的semi-hard negative构建triplet进行训练，取得了不错的效果。

Offline和online triplet mining

通过上面的分析，可以看到，easy negative example比较容易识别，没必要构建太多由easy negative example组成的triplet，否则会严重降低训练效率。若都采用hard negative example，又可能会影响训练效果。这时，就需要一定的方法进行triplet的挑选，也就是“mine the triplets”。

Offline triplet mining

离线方式的triplet mining将所有的训练数据喂给神经网络，得到每一个训练样本的编码，根据编码计算得到negative example与anchor和positive example之间的距离，根据这个距离判断semi-hard triplets，hard triplets还是easy triplets。offline triplet mining 仅仅选择select hard or semi-hard triplets，因为easy triplet太容易了，没有必要训练。

总得来说，这个方法不够高效，因为最初要把所有的训练数据喂给神经网络，而且每过1个或几个epoch，可能还要重新对negative examples进行分类。

Online triplet mining

Google的研究人员为解决上述问题，提出了online triplet mining的方法。该方法的motivation比较简单，将B张图片（一个batch）喂给神经网络，得到B张图片的embedding，将triplet的组合一共最多$B^3$个triplets，其中包含很多没用的triplet（比如，三个negative examples和三个positive examples，这种称作invalid triplets）。哪些是valid triplets呢？假设一个triplet$(B_i,B_j,B_k)$，如果样本i和j有相同的label且不是同一个样本，而样本k具有不同的label，则称其为valid triplet。

假设一个batch的数据包含P*K张人脸，P个人，每人K张图片。

针对valid triplet的“挑选”，有以下两个策略（来自论文[《In Defense of the Triplet Loss for Person Re-Identification》]([1703.07737] In Defense of the Triplet Loss for Person Re-Identification)：

batch all: 计算所有的valid triplet，对hard 和 semi-hard triplets上的loss进行平均。
不考虑easy triplets，因为easy triplets的损失为0，平均会把整体损失缩小
将会产生PK(K-1)(PK-K)个triplet，即PK个anchor，对于每个anchor有k-1个可能的positive example，PK-K个可能的negative examples
batch hard: 对于每一个anchor，选择hardest positive example(距离anchor最大的positive example)和hardest negative(距离anchor最大的negative example)，
由此产生PK个triplet
这些triplet是最难分的

Online triplet loss

论文[《In Defense of the Triplet Loss for Person Re-Identification》]([1703.07737] In Defense of the Triplet Loss for Person Re-Identification)实验结果表明，batch hard的表现是最好的。

那如何用tensorflow实现triplet loss呢？

offline triplets

很简单，就是实现上面offline triplets的公式，tensorflow的实现如下：


anchor_output = ... # shape [None, 128]

positive_output = ... # shape [None, 128]

negative_output = ... # shape [None, 128]

d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)

d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)

loss = tf.maximum(0.0, margin + d_pos - d_neg)

loss = tf.reduce_mean(loss)

online triplets

batch all的实现方式


def batch_all_triplet_loss(labels, embeddings, margin, squared=False):

"""Build the triplet loss over a batch of embeddings.

We generate all the valid triplets and average the loss over the positive ones.

Args:

labels: labels of the batch, of size (batch_size,)

embeddings: tensor of shape (batch_size, embed_dim)

margin: margin for triplet loss

squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.

If false, output is the pairwise euclidean distance matrix.

Returns:

triplet_loss: scalar tensor containing the triplet loss

"""

# Get the pairwise distance matrix

pairwise_dist = _pairwise_distances(embeddings, squared=squared)

anchor_positive_dist = tf.expand_dims(pairwise_dist, 2)

anchor_negative_dist = tf.expand_dims(pairwise_dist, 1)

# Compute a 3D tensor of size (batch_size, batch_size, batch_size)

# triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k

# Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)

# and the 2nd (batch_size, 1, batch_size)

triplet_loss = anchor_positive_dist - anchor_negative_dist + margin

# Put to zero the invalid triplets

# (where label(a) != label(p) or label(n) == label(a) or a == p)

mask = _get_triplet_mask(labels)

mask = tf.to_float(mask)

triplet_loss = tf.multiply(mask, triplet_loss)

# Remove negative losses (i.e. the easy triplets)

triplet_loss = tf.maximum(triplet_loss, 0.0)

# Count number of positive triplets (where triplet_loss > 0)

valid_triplets = tf.to_float(tf.greater(triplet_loss, 1e-16))

num_positive_triplets = tf.reduce_sum(valid_triplets)

num_valid_triplets = tf.reduce_sum(mask)

fraction_positive_triplets = num_positive_triplets / (num_valid_triplets + 1e-16)

# Get final mean triplet loss over the positive valid triplets

triplet_loss = tf.reduce_sum(triplet_loss) / (num_positive_triplets + 1e-16)

return triplet_loss, fraction_positive_triplets

batch hard的实现方式


def batch_hard_triplet_loss(labels, embeddings, margin, squared=False):

"""Build the triplet loss over a batch of embeddings.

For each anchor, we get the hardest positive and hardest negative to form a triplet.

Args:

labels: labels of the batch, of size (batch_size,)

embeddings: tensor of shape (batch_size, embed_dim)

margin: margin for triplet loss

squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.

If false, output is the pairwise euclidean distance matrix.

Returns:

triplet_loss: scalar tensor containing the triplet loss

"""

# Get the pairwise distance matrix

pairwise_dist = _pairwise_distances(embeddings, squared=squared)

# For each anchor, get the hardest positive

# First, we need to get a mask for every valid positive (they should have same label)

mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)

mask_anchor_positive = tf.to_float(mask_anchor_positive)

# We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))

anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)

# shape (batch_size, 1)

hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)

# For each anchor, get the hardest negative

# First, we need to get a mask for every valid negative (they should have different labels)

mask_anchor_negative = _get_anchor_negative_triplet_mask(labels)

mask_anchor_negative = tf.to_float(mask_anchor_negative)

# We add the maximum value in each row to the invalid negatives (label(a) == label(n))

max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True)

anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)

# shape (batch_size,)

hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)

# Combine biggest d(a, p) and smallest d(a, n) into final triplet loss

triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)

# Get final mean triplet loss

triplet_loss = tf.reduce_mean(triplet_loss)

return triplet_loss

在minist等数据集上的效果都是棒棒哒。

总结

triplet loss的实现不是很简单，比较tricky的地方是如何计算embedding的距离，以及怎样识别并抛弃掉invalid和easy triplet。当然，如果您使用的是tensorflow，可以直接移步至github repository，有一份写好的triplet loss在等着你。。。

可能有人会有疑惑，siamese network, triplet network的输入都是成对的，或者triplet的三元组，怎么对一个样本进行分类啊？神经网络的优势在于表示学习，自动的特征提取，所以，成对，或者triplet的输入能让神经网络有更好的输入表示，后面再接svm, logtistic regression就可以啦。
本文译自Olivier Moindrot的[blog](Triplet Loss and Online Triplet Mining in TensorFlow)，英语好的可移步至其博客。

我们在之前的文章里介绍了[siamese network以及triplet network](Siamese network 孪生神经网络--一个简单神奇的结构)的基本概念，本文将介绍一下triplet network中triplet loss一些有趣的地方。

前言

在人脸识别领域，triplet loss通常用来学习人脸的向量表示。如果您对triplet loss不太了解推荐观看Andrew Ng在Coursera上的deep learning specialization。

Triplet loss难于实现，本文将介绍triplet loss的定义以及triplet训练时的策略。为什么要有训练策略？所有的triplet组合太多了，都要训练太inefficient，所以要挑一些比较好的triplet进行训练，高效&效果好。

Triplet loss 和 triplet mining

为什么不用softmax，而使用triplet loss?

Triplet loss最早被用在人脸识别任务上，《FaceNet: A Unified Embedding for Face Recognition》 by Google。Google的研究人员提出了通过online triplet mining的方式训练处人脸的新向量表示。接下来我们会详细讨论。

在有监督的机器学习领域，通常有固定的类别，这时就可以使用基于softmax的交叉熵损失函数进行训练。但有时，类别是一个变量，此时使用triplet loss就能解决问题。在人脸识别，Quora question pair任务中，triplet loss的优势在于细节区分，即当两个输入相似时，triplet loss能够更好地对细节进行建模，相当于加入了两个输入差异性差异的度量，学习到输入的更好表示，从而在上述两个任务中有出色的表现。当然，triplet loss的缺点在于其收敛速度慢，有时不收敛。

Triplet loss的motivation是要让属于同一个人的人脸尽可能地“近”

（在embedding空间里），而与其他人脸尽可能地“远”。

Triplet loss 定义

Triplet loss 在 positive faces (Obama) 和 negative face (Macron)上的示意图

triplet loss的目标是:

两个具有同样标签的样本，他们在新的编码空间里距离很近。

两个具有不同标签的样本，他们在新的编码空间里距离很远。

进一步，我们希望两个positive examples和一个negative example中，negative example与positive example的距离，大于positive examples之间的距离，或者大于某一个阈值：margin。

triplet loss定义在下面三元组概念之上：

an anchor(基准正例)
a positive of the same class as the anchor （正例）
a negative of a different class （负例）

对于（a,p,n）这个triplet(三元组)，其triplet loss就可以写作：

[图片上传失败...(image-fd03a2-1523449977468)]

这时可以通过最小化上述损失函数，a与p之间的距离d(a,p)=0，而a与n之间的距离d(a,n)大于d(a,p)+margin。当negative example很好识别时，上述损失函数为0，否则是一个比较大的值。

Triplet mining

基于triplet loss的定义，可以将triplet(三元组)分为三类：

easy triplets(简单三元组): triplet对应的损失为0的三元组，形式化定义为$d(a,n)>d(a,p)+margin$。

hard triplets（困难三元组）: negative example 与anchor距离小于anchor与positive example的距离，形式化定义为$d(a,n)<d(a,p)$。

semi-hard triplets（一般三元组）: negative example 与anchor距离大于anchor与positive example的距离，但还不至于使得loss为0，即$d(a,p)<d(a,n)<d(a,p)+margin$。

上述三种概念都是基于negative example与anchor和positive距离定义的。类似的，可以根据上述定义将negative examples分为3类：hard negatives, easy negatives, semi-hard negatives。如下图所示，这个图构建了编码空间中三种negative examples与anchor和positive example之间的距离关系。

三种negative examples与anchor和positive example之间的距离关系

如何选择triplet或者negative examples，对模型的效率有很大影响。在上述Facenet论文中，采用了随机的semi-hard negative构建triplet进行训练，取得了不错的效果。

Offline和online triplet mining

通过上面的分析，可以看到，easy negative example比较容易识别，没必要构建太多由easy negative example组成的triplet，否则会严重降低训练效率。若都采用hard negative example，又可能会影响训练效果。这时，就需要一定的方法进行triplet的挑选，也就是“mine the triplets”。

Offline triplet mining

离线方式的triplet mining将所有的训练数据喂给神经网络，得到每一个训练样本的编码，根据编码计算得到negative example与anchor和positive example之间的距离，根据这个距离判断semi-hard triplets，hard triplets还是easy triplets。offline triplet mining 仅仅选择select hard or semi-hard triplets，因为easy triplet太容易了，没有必要训练。

总得来说，这个方法不够高效，因为最初要把所有的训练数据喂给神经网络，而且每过1个或几个epoch，可能还要重新对negative examples进行分类。

Online triplet mining

Google的研究人员为解决上述问题，提出了online triplet mining的方法。该方法的motivation比较简单，将B张图片（一个batch）喂给神经网络，得到B张图片的embedding，将triplet的组合一共最多$B^3$个triplets，其中包含很多没用的triplet（比如，三个negative examples和三个positive examples，这种称作invalid triplets）。哪些是valid triplets呢？假设一个triplet$(B_i,B_j,B_k)$，如果样本i和j有相同的label且不是同一个样本，而样本k具有不同的label，则称其为valid triplet。

假设一个batch的数据包含P*K张人脸，P个人，每人K张图片。

针对valid triplet的“挑选”，有以下两个策略（来自论文[《In Defense of the Triplet Loss for Person Re-Identification》]([1703.07737] In Defense of the Triplet Loss for Person Re-Identification)：

batch all: 计算所有的valid triplet，对hard 和 semi-hard triplets上的loss进行平均。
不考虑easy triplets，因为easy triplets的损失为0，平均会把整体损失缩小
将会产生PK(K-1)(PK-K)个triplet，即PK个anchor，对于每个anchor有k-1个可能的positive example，PK-K个可能的negative examples
batch hard: 对于每一个anchor，选择hardest positive example(距离anchor最大的positive example)和hardest negative(距离anchor最大的negative example)，
由此产生PK个triplet
这些triplet是最难分的

Online triplet loss

论文[《In Defense of the Triplet Loss for Person Re-Identification》]([1703.07737] In Defense of the Triplet Loss for Person Re-Identification)实验结果表明，batch hard的表现是最好的。

那如何用tensorflow实现triplet loss呢？

offline triplets

很简单，就是实现上面offline triplets的公式，tensorflow的实现如下：


anchor_output = ... # shape [None, 128]

positive_output = ... # shape [None, 128]

negative_output = ... # shape [None, 128]

d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)

d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)

loss = tf.maximum(0.0, margin + d_pos - d_neg)

loss = tf.reduce_mean(loss)

online triplets

batch all的实现方式


def batch_all_triplet_loss(labels, embeddings, margin, squared=False):

"""Build the triplet loss over a batch of embeddings.

We generate all the valid triplets and average the loss over the positive ones.

Args:

labels: labels of the batch, of size (batch_size,)

embeddings: tensor of shape (batch_size, embed_dim)

margin: margin for triplet loss

squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.

If false, output is the pairwise euclidean distance matrix.

Returns:

triplet_loss: scalar tensor containing the triplet loss

"""

# Get the pairwise distance matrix

pairwise_dist = _pairwise_distances(embeddings, squared=squared)

anchor_positive_dist = tf.expand_dims(pairwise_dist, 2)

anchor_negative_dist = tf.expand_dims(pairwise_dist, 1)

# Compute a 3D tensor of size (batch_size, batch_size, batch_size)

# triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k

# Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)

# and the 2nd (batch_size, 1, batch_size)

triplet_loss = anchor_positive_dist - anchor_negative_dist + margin

# Put to zero the invalid triplets

# (where label(a) != label(p) or label(n) == label(a) or a == p)

mask = _get_triplet_mask(labels)

mask = tf.to_float(mask)

triplet_loss = tf.multiply(mask, triplet_loss)

# Remove negative losses (i.e. the easy triplets)

triplet_loss = tf.maximum(triplet_loss, 0.0)

# Count number of positive triplets (where triplet_loss > 0)

valid_triplets = tf.to_float(tf.greater(triplet_loss, 1e-16))

num_positive_triplets = tf.reduce_sum(valid_triplets)

num_valid_triplets = tf.reduce_sum(mask)

fraction_positive_triplets = num_positive_triplets / (num_valid_triplets + 1e-16)

# Get final mean triplet loss over the positive valid triplets

triplet_loss = tf.reduce_sum(triplet_loss) / (num_positive_triplets + 1e-16)

return triplet_loss, fraction_positive_triplets

batch hard的实现方式


def batch_hard_triplet_loss(labels, embeddings, margin, squared=False):

"""Build the triplet loss over a batch of embeddings.

For each anchor, we get the hardest positive and hardest negative to form a triplet.

Args:

labels: labels of the batch, of size (batch_size,)

embeddings: tensor of shape (batch_size, embed_dim)

margin: margin for triplet loss

squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.

If false, output is the pairwise euclidean distance matrix.

Returns:

triplet_loss: scalar tensor containing the triplet loss

"""

# Get the pairwise distance matrix

pairwise_dist = _pairwise_distances(embeddings, squared=squared)

# For each anchor, get the hardest positive

# First, we need to get a mask for every valid positive (they should have same label)

mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)

mask_anchor_positive = tf.to_float(mask_anchor_positive)

# We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))

anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)

# shape (batch_size, 1)

hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)

# For each anchor, get the hardest negative

# First, we need to get a mask for every valid negative (they should have different labels)

mask_anchor_negative = _get_anchor_negative_triplet_mask(labels)

mask_anchor_negative = tf.to_float(mask_anchor_negative)

# We add the maximum value in each row to the invalid negatives (label(a) == label(n))

max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True)

anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)

# shape (batch_size,)

hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)

# Combine biggest d(a, p) and smallest d(a, n) into final triplet loss

triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)

# Get final mean triplet loss

triplet_loss = tf.reduce_mean(triplet_loss)

return triplet_loss

在minist等数据集上的效果都是棒棒哒。

总结

triplet loss的实现不是很简单，比较tricky的地方是如何计算embedding的距离，以及怎样识别并抛弃掉invalid和easy triplet。当然，如果您使用的是tensorflow，可以直接移步至github repository，有一份写好的triplet loss在等着你。。。

可能有人会有疑惑，siamese network, triplet network的输入都是成对的，或者triplet的三元组，怎么对一个样本进行分类啊？神经网络的优势在于表示学习，自动的特征提取，所以，成对，或者triplet的输入能让神经网络有更好的输入表示，后面再接svm, logtistic regression就可以啦。

网友评论

本文标题：Triplet Loss及tensorflow实现

本文链接：https://www.haomeiwen.com/subject/oprzhftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Triplet Loss及tensorflow实现

前言

Triplet loss 和 triplet mining

为什么不用softmax，而使用triplet loss?

Triplet loss 定义

Triplet mining

Offline和online triplet mining

Offline triplet mining

Online triplet mining

offline triplets

online triplets

batch all的实现方式

batch hard的实现方式

总结

前言

Triplet loss 和 triplet mining

为什么不用softmax，而使用triplet loss?

Triplet loss 定义

Triplet mining

Offline和online triplet mining

Offline triplet mining

Online triplet mining

offline triplets

online triplets

batch all的实现方式

batch hard的实现方式

总结

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读