[论文阅读笔记]Explaining and Harnessin

作者: wangxiaoguang | 来源:发表于2020-02-23 23:24 被阅读0次

[论文阅读笔记]Explaining and Harnessin
论文阅读_ 解释黑盒模型方法综述
对机器学习模型的可解释性讨论（一）
分类模型的可解释性
DEEP GRAPH INFOMAX 阅读笔记
25组-Deep Residual Learning for I
Transformer-XL: 在自注意力模型中处理长距离依赖
【学习笔记】懂你英语商务英语 Level 2 Unit 2 P
论文阅读笔记 RPT: Learning Point Set R
深度学习经典论文Top100 系列之优化-Dropout(1)

论文题目：Explaining and Harnessing Adversarial Examples
论文地址：https://arxiv.org/pdf/1412.6572.pdf
代码实现：fgsm.ipynb

该文章发表在ICLR 2015上，是对抗样本领域的经典论文。由于这篇博客[论文笔记]Explaining & Harnessing Adversarial Examples写得已经非常清晰了，我就不赘述了。

其他参考链接：
1. 简书，Explaining and Harnessing Adversarial Examples
2. 论文解读 | Explaining and Harnessing Adversarial Examples
3. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES笔记
4. 《Explaining and Harnessing Adversarial Examples》阅读笔记

代码实现

Fast gradient sign method

The fast gradient sign method works by using the gradients of the neural network to create an adversarial example. For an input image, the method uses the gradients of the loss with respect to the input image to create a new image that maximises the loss. This new image is called the adversarial image. This can be summarised using the following expression:
$adv\_x = x + \epsilon*\text{sign}(\nabla_xJ(\theta, x, y))$

where

adv_x : Adversarial image.
x : Original input image.
y : Original input label.
$\epsilon$ : Multiplier to ensure the perturbations are small.
$\theta$ : Model parameters.
$J$ : Loss.

%tensorflow_version 1.x
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
import matplotlib as mpl
import matplotlib.pyplot as plt
import PIL
import numpy as np
import json
from urllib.request import urlretrieve

The class of Adversarial Example

neural networks: Inception v3
adversarial attack method: FGSM

class AdversarialExample:
  def __init__(self):

    self._initialize_session()
    self._build_graph()
    self._restore_model()

  def _initialize_session(self):
    tf.logging.set_verbosity(tf.logging.ERROR)
    self.sess = tf.InteractiveSession()
  
  def _build_graph(self):
    # define inputs
    self.image = tf.Variable(tf.zeros((299, 299, 3)))

    preprocessed = tf.multiply(tf.subtract(tf.expand_dims(self.image, 0), 0.5), 2.0)
    arg_scope = nets.inception.inception_v3_arg_scope(weight_decay=0.0)
    with slim.arg_scope(arg_scope):
        logits, _ = nets.inception.inception_v3(
            preprocessed, 1001, is_training=False, reuse=False)
        # ignore background class
        self.logits = logits[:,1:]
        # probabilities
        self.probs = tf.nn.softmax(logits)
  
  def _restore_model(self):
    restore_vars = [
        var for var in tf.global_variables()
        if var.name.startswith('InceptionV3/')
    ]
    self.saver = tf.train.Saver(restore_vars)
    # Here you can change to the path of your saved model
    self.saver.restore(self.sess, 'drive/My Drive/adversarial attacks/checkpoint/inception_v3.ckpt')

  def classify(self, img, correct_class=None, target_class=None):
    imagenet_json, _ = urlretrieve(
        'https://raw.githubusercontent.com/xunguangwang/Adversarial-Attacks/master/imagenet.json')
    with open(imagenet_json) as f:
      imagenet_labels = json.load(f)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
    fig.sca(ax1)
    p = self.sess.run(self.probs, feed_dict={self.image: img})[0]
    ax1.imshow(img)
    fig.sca(ax1)
    
    topk = list(p.argsort()[-10:][::-1])
    topprobs = p[topk]
    barlist = ax2.bar(range(10), topprobs)
    if target_class in topk:
        barlist[topk.index(target_class)].set_color('r')
    if correct_class in topk:
        barlist[topk.index(correct_class)].set_color('g')
    plt.sca(ax2)
    plt.ylim([0, 1.1])
    plt.xticks(range(10),
               [imagenet_labels[i][:15] for i in topk],
               rotation='vertical')
    fig.subplots_adjust(bottom=0.2)
    plt.show()

  def fgsm_attack(self, image, label, epsilon=0):
    label = tf.one_hot(label, 1000)

    loss = tf.nn.softmax_cross_entropy_with_logits(logits=self.logits, labels=[label])
    # Get the gradients of the loss w.r.t to the input image.
    gradient = tf.gradients(loss, self.image)
    # Get the sign of the gradients to create the perturbation
    signed_grad = tf.sign(gradient)
    perturbation = epsilon*signed_grad[0]
    # Adversarial image
    adv_image = tf.clip_by_value(image + perturbation, 0, 1)
    
    return self.sess.run([adv_image, perturbation], feed_dict={self.image: img})

Original image

get the original image and preprocess it
classify the image by inecption v3

img_path, _ = urlretrieve('https://raw.githubusercontent.com/xunguangwang/Adversarial-Attacks/master/images/cat.jpg')
img_class = 281
img = PIL.Image.open(img_path)
big_dim = max(img.width, img.height)
wide = img.width > img.height
new_w = 299 if not wide else int(img.width * 299 / img.height)
new_h = 299 if wide else int(img.height * 299 / img.width)
img = img.resize((new_w, new_h)).crop((0, 0, 299, 299))
img = (np.asarray(img) / 255.0).astype(np.float32)

model = AdversarialExample()
model.classify(img, correct_class=img_class)

Create the adversarial image

to show the perturbation ( $\epsilon*\text{sign}(\nabla_xJ(\theta, x, y))$ )

epsilon = 0.01
adv_image,perturbation = model.fgsm_attack(img, img_class, epsilon)
plt.imshow((perturbation+epsilon)/(2*epsilon))

perturbation

model.classify(adv_image, correct_class=img_class)

we can see that the model misclassified the adversarial image

本文代码基于colab环境，以Inception v3为攻击模型，实现了FGSM的样例。文章开头已提供了github代码地址，如果你想运行起来，首先下载Inception v3模型文件并解压，然后更改代码中的模型路径。

代码参考链接

网友评论

本文标题：[论文阅读笔记]Explaining and Harnessin

本文链接：https://www.haomeiwen.com/subject/jlqlqhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！