论文题目:Explaining and Harnessing Adversarial Examples
论文地址:https://arxiv.org/pdf/1412.6572.pdf
代码实现:fgsm.ipynb
该文章发表在ICLR 2015上,是对抗样本领域的经典论文。由于这篇博客[论文笔记]Explaining & Harnessing Adversarial Examples写得已经非常清晰了,我就不赘述了。
其他参考链接:
1. 简书,Explaining and Harnessing Adversarial Examples
2. 论文解读 | Explaining and Harnessing Adversarial Examples
3. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES笔记
4. 《Explaining and Harnessing Adversarial Examples》阅读笔记
代码实现
Fast gradient sign method
The fast gradient sign method works by using the gradients of the neural network to create an adversarial example. For an input image, the method uses the gradients of the loss with respect to the input image to create a new image that maximises the loss. This new image is called the adversarial image. This can be summarised using the following expression:
where
- adv_x : Adversarial image.
- x : Original input image.
- y : Original input label.
- : Multiplier to ensure the perturbations are small.
- : Model parameters.
- : Loss.
%tensorflow_version 1.x
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
import matplotlib as mpl
import matplotlib.pyplot as plt
import PIL
import numpy as np
import json
from urllib.request import urlretrieve
The class of Adversarial Example
- neural networks: Inception v3
- adversarial attack method: FGSM
class AdversarialExample:
def __init__(self):
self._initialize_session()
self._build_graph()
self._restore_model()
def _initialize_session(self):
tf.logging.set_verbosity(tf.logging.ERROR)
self.sess = tf.InteractiveSession()
def _build_graph(self):
# define inputs
self.image = tf.Variable(tf.zeros((299, 299, 3)))
preprocessed = tf.multiply(tf.subtract(tf.expand_dims(self.image, 0), 0.5), 2.0)
arg_scope = nets.inception.inception_v3_arg_scope(weight_decay=0.0)
with slim.arg_scope(arg_scope):
logits, _ = nets.inception.inception_v3(
preprocessed, 1001, is_training=False, reuse=False)
# ignore background class
self.logits = logits[:,1:]
# probabilities
self.probs = tf.nn.softmax(logits)
def _restore_model(self):
restore_vars = [
var for var in tf.global_variables()
if var.name.startswith('InceptionV3/')
]
self.saver = tf.train.Saver(restore_vars)
# Here you can change to the path of your saved model
self.saver.restore(self.sess, 'drive/My Drive/adversarial attacks/checkpoint/inception_v3.ckpt')
def classify(self, img, correct_class=None, target_class=None):
imagenet_json, _ = urlretrieve(
'https://raw.githubusercontent.com/xunguangwang/Adversarial-Attacks/master/imagenet.json')
with open(imagenet_json) as f:
imagenet_labels = json.load(f)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
fig.sca(ax1)
p = self.sess.run(self.probs, feed_dict={self.image: img})[0]
ax1.imshow(img)
fig.sca(ax1)
topk = list(p.argsort()[-10:][::-1])
topprobs = p[topk]
barlist = ax2.bar(range(10), topprobs)
if target_class in topk:
barlist[topk.index(target_class)].set_color('r')
if correct_class in topk:
barlist[topk.index(correct_class)].set_color('g')
plt.sca(ax2)
plt.ylim([0, 1.1])
plt.xticks(range(10),
[imagenet_labels[i][:15] for i in topk],
rotation='vertical')
fig.subplots_adjust(bottom=0.2)
plt.show()
def fgsm_attack(self, image, label, epsilon=0):
label = tf.one_hot(label, 1000)
loss = tf.nn.softmax_cross_entropy_with_logits(logits=self.logits, labels=[label])
# Get the gradients of the loss w.r.t to the input image.
gradient = tf.gradients(loss, self.image)
# Get the sign of the gradients to create the perturbation
signed_grad = tf.sign(gradient)
perturbation = epsilon*signed_grad[0]
# Adversarial image
adv_image = tf.clip_by_value(image + perturbation, 0, 1)
return self.sess.run([adv_image, perturbation], feed_dict={self.image: img})
Original image
- get the original image and preprocess it
- classify the image by inecption v3
img_path, _ = urlretrieve('https://raw.githubusercontent.com/xunguangwang/Adversarial-Attacks/master/images/cat.jpg')
img_class = 281
img = PIL.Image.open(img_path)
big_dim = max(img.width, img.height)
wide = img.width > img.height
new_w = 299 if not wide else int(img.width * 299 / img.height)
new_h = 299 if wide else int(img.height * 299 / img.width)
img = img.resize((new_w, new_h)).crop((0, 0, 299, 299))
img = (np.asarray(img) / 255.0).astype(np.float32)
model = AdversarialExample()
model.classify(img, correct_class=img_class)
Create the adversarial image
- to show the perturbation ()
epsilon = 0.01
adv_image,perturbation = model.fgsm_attack(img, img_class, epsilon)
plt.imshow((perturbation+epsilon)/(2*epsilon))
perturbation
model.classify(adv_image, correct_class=img_class)
-
we can see that the model misclassified the adversarial image
本文代码基于colab环境,以Inception v3为攻击模型,实现了FGSM的样例。文章开头已提供了github代码地址,如果你想运行起来,首先下载Inception v3模型文件并解压,然后更改代码中的模型路径。
网友评论