论文题目:On the Geometry of Adversarial Examples
论文地址:On the Geometry of Adversarial Examples
Abstract
对抗样本是机器学习模型的普遍现象,输入看似不可察觉的扰动却能导致准确模型的误分类。我们利用流形重建技术,提出一个几何框架,以分析对抗样本的高维几何。特别是,我们强调了余维数的重要性:对于嵌入到高维空间中的低维数据流形,在流形上有许多方向可用来构造对抗样本。对抗样本是学习决策边界的自然结果,该决策边界很好地对低维数据流形进行了分类,但是对流形附近的点进行了错误分类。使用我们的几何框架,我们证明了(1) a tradeoff between robustness under different norms, (2) that adversarial training in balls around the data is sample inefficient, and (3) sufficient sampling conditions under which nearest neighbor classifiers and ball-based adversarial training are robust.
1. Introduction
大规模的深度学习已经在计算机视觉、自然语言处理和机器人等重要问题上取得了突破。此后不久,对抗样本这一有趣现象被发现了。这导致了研究者们发展了许多对抗攻击方法和相应的对抗防御方法。随着深度学习应用到实际生产生活中,我们必须理解模型的弱点,并设计适当的对策。为此,作者提出了用于分析对抗样本的几何框架。
作者的几何框架基于流形假设,即高维数据是由低维的流形结构嵌入在高维空间中形成的。作者认为对抗样本是恰好分布在流形附近的数据点。模型可以将低维流形的数据分类正确,却无法将流形附近的点分类正确。高余维(the high codimension, the difference between the dimension of the data manifold and the dimension of the embedding space)是对抗样本无处不在的关键原由。
该论文的贡献如下:
- 建立了几何框架。
- 强调余维在对抗样本导致的脆弱性方面所起的作用。
- 利用该框架,我们证明了如下结果:
(1). 选择不同的范数(norm)来限制攻击者很重要,因为不同的范数和鲁棒性之间存在权衡;
(2). 我们表明,使用与训练集相近的对抗样本进行训练的通用方法不足以学习到可靠的决策边界;
(3). 我们表明,最近邻分类器由于其决策边界远离数据的几何特性而不会遭受这种不足,因此代表了一种潜在的鲁棒分类算法。
2. Related Work
主要叙述了前期对抗样本与高维几何相关的研究进展,还介绍了流形重建算法。
3. The Geometry of Data
对低维流形嵌入到高维空间进行建模,维流形,则余维是。
在这篇论文中,我们感兴趣的是分类问题。因此总共类的每一类建模数据分别从流形中采样得到,则整个流形空间(data manifold)为。个点的有限样本,,记作。每个样本点有对应的类别标签,则采样自流形。
如下图1所示,决策轴(the decision axis)是一系列点的集合,这些点是在 -范数下距离不同流形最近的点(the decision axis is the set of points that have two or more closest points, in the norm , on distinct class manifolds)。
任取集合,则的可达距离被定义为(The reach of is defined as inf )。
的-tubular neighborhood被定义为。虽然是k维,却是d维。考虑分类器,一个-对抗样本( -adversarial example)是点,使得。
4. A Provable Tradeoff in Robustness Between Norms
在p范数下,如果,则的决策面(decision boundary)就是-robust。换句话说,从点开始,扰动(perturbation)在p范数下必须大于(跨过决策边界)。在p范数下,最鲁棒的决策边界就是。在定理1中,构造了一个学习设定,其中与不同。
Theorem 1. Let be two concentric -spheres with radii respectively. Let and let be the and decision axes of Then Furthermore .
根据定理1,我们得出在L2范数下从到的最小距离的上限为。如果训练分类器来学习,则从开始,攻击者可以构造一个的对抗样本,扰动小至。 因此,对扰动的鲁棒性较低。 图2通过实验验证了这一结果。
5. Provably Robust Classifiers
对抗训练是构建健壮模型非常自然的方法,相当于从中采样进行训练。接下来将说明这种方法比其他分类算法的采样效率低得多。
首先定义一个学习算法, 其训练集为,采样自,输出模型使得对每一个和标签,每一个,都有。这里,表示在相应范数中以半径r的x为中心的球。就是说,学习了一个模型,该模型对于任何从直至的扰动都输出相同的标签,正如以为参数输出的结果一样(That is, learns a model that outputs the same label for any -perturbation of up to as it outputs for .)。是对抗训练的理论模型。定理2指出在高余维中采样效率低。
定理2来源于定理3和定理4。在定理3和4中,我们将提出最近邻邻居分类器是一种这样的分类算法。最近邻分类器在高余维中天生具有鲁棒性,因为当密集时,的Voronoi单元会在垂直于的方向上伸长(这里不是太明白)。
在陈述定理3之前,必须介绍在上采样的条件。A -cover of a manifold in the norm is a finite set of points such that for every there exists such that(这句话其实是说样本集是采样充分的,对于原先数据集中任何的元素,都能在样本集中找到与之相近的样本). 定理3给定了充分的条件使得和能够准确分类,并且的采样密度要大幅低于。
接下来,我们将展示一个设置,其中需要类似于定理3中的的边界。在这种设置下,和的采样要求之间的因子之差为,导致和的大小之间达到指数差距,以实现相同的鲁棒性。
Define that is is a subset of the
-plane bounded between the coordinates Similarly define
Note that lies in the subspace thus where is the decision axis of In the norm we can show that the gap in Theorem 3 is necessary for Furthermore the bounds we derive for -covers for for both and are tight. Combined with well-known properties of covers, we get that the ratio is exponential in
我们已经表明,当提供足够密集的样本时,和最近邻分类器都学习鲁棒的决策边界。但是,在某些设置中,在达到相同数量的鲁棒性方面,最近邻的效率是的指数倍高。 实验验证这些理论结果见第8.1节。
6. is a Poor Model of
Madry等(2018)建议在an adversary的帮助下训练鲁棒的分类器,该迭代器在每次迭代时都会围绕训练集产生被错误分类的-扰动。 在我们的记法中,这对应于学习决策边界去正确分类。
我们认为这种方法在实践中不够鲁棒,因为常常不是的理想模型。在本节中,我们表明体积vol 通常只占体积的很小一部分。这些结果揭示了为什么第5节中定义的基于球的学习算法的采样效率比最近邻分类器低得多。
Theorem 5. Let be a -dimensional manifold embedded in such that . Let be a finite set of points sampled from . Suppose that where is the medial axis of defined as in Dey ( 2007) . Then the percentage of covered by is upper bounded by
在高维中,即使的适度欠采样也会导致覆盖率的显着损失,因为以样本为中心的球的并集体积收缩速度快于的体积。定理5指出,在高维数中,覆盖的的分数变为0。对于实际可行的训练集大小,几乎没有覆盖。
Thus is a poor model of , and high classificaiton accuracy on does not imply high accuracy in .
接下来,通过k维平面的特殊情况,提供定理5的直观样例。Define that is is a subset of the -plane bounded between the coordinates Recall that a -cover of a manifold in the norm is a finite set of points such that for every there exists such that. 构造一个明确的的-cover 很容易:将采样点放置在规则网格的顶点上,如图4中的黑色顶点所示。此规则网格的立方体中心(如图4中的蓝色所示)是样本中最远的点。从网格顶点到中心的距离为,其中是沿网格轴的点之间的间距。要构造一个-cover,我们需要,它的间距为。-cover的样本规模为。请注意,以(即的维度)而不是(即嵌入空间的维度)为指数缩放(Note that scales exponentially in , the dimension of , not in , the dimension of the embedding space)。
Recall that is the -tubular neighborhood of . The -balls around , which comprise , cover and so any robust approach that guarantees correct classification within X δ will achieve perfect accuracy on . However, we will show that covers only a vanishingly small fraction of . Let denote the -ball of radius centered at the origin. An upper bound on the volume of is
Next we bound the volume from below. Intuitively, a lower bound on the volume can be derived by placing a ()-dimensional ball in the normal space at each point of Π and integrating the volumes. Figure 4 (Right) illustrates the lower bound argument in the case of k = 1,d = 2.
Combining Equations 5 and 6 gives an upper bound on the percentage of that is covered by .
Notice that the factors involving and () cancel. Figure 6 (Left) shows that this expression approaches 0 as the codimension () of increases.
Suppose we set and construct a 1-cover of . The number of points necessary to cover with balls of radius 1 depends only on , not the embedding dimension . However the number of points necessary to cover the tubular neighborhood with balls of radius 1 increases depends on both and . In Theorem 6 we derive a lower bound on the number of samples necessary to cover .
Theorem 6. Let II be a bounded -flat as described above, bounded along each axis by . Let denote the number of samples necessary to cover the 1-tubular neighborhood of with -balls of radius 1. That is let be the minimum value for which there exists a finite sample of size such that . Then
定理6指出,通常,对进行精确建模比对进行建模所需的样本少得多。图6(右)将构造的1-cover所需的点数与定理6中覆盖所需点数的下限进行了比较。覆盖所需的点数以的形式增加,在上多项式缩放,在上呈指数级缩放。与之对比,随着d的增加,构造的1-cover所需的数量保持恒定,仅取决于。
通过在以训练集为中心的-ball中生成对抗样本来产生鲁棒分类器的方法不能准确地对建模,并且这样做需要更多的样本。 如果该方法在定义的-ball之外任意发挥作用,那么对抗样本仍将存在,并且很容易找到它们。尽管对抗样本表明了脆弱性,但深度学习在各种任务上表现出色的原因是因为在上表现出色比在上表现出色要容易得多。
网友评论