Pointer Networks

作者: MasterXiong | 来源:发表于2019-07-12 15:06 被阅读0次

Pointer Networks
Pointer Networks
Pointer Networks
#NIPS-2015# Pointer Networks
指针网络-Pointer Networks
神经网络之Pointer Net (Ptr-net)
设置元素不可点击
iOS Tagged Pointer
go调用c全面解析
Hash Pointer and Blockchain

Pointer Networks

Oriol Vinyals, Meire Fortunato, Navdeep Jaitly
Google, Berkeley
NIPS 2015

Introduction

The key motivation and contribution of this work is a pointer network framework which can solve discrete mapping problems with different dictionary size across instances.

Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention.

The method is to

use attention as a pointer to select a member of the input sequence as the output.

This work provides a new approach to solve discrete optimization problems with sequential model.

Problem Setup

This paper focuses on solving a specific type of seqence-to-sequence task in a supervised learning approach. In the training data, the inputs are planar point sets $\mathcal{P} = \{ P_1, \cdots , P_n \}$ with $n$ elements each, where $P_i = (x_i, y_i)$ are the cartesian coordinates of the points. The training instances are sampled from a uniform distribution in $[0, 1] \times [0, 1]$ . The outputs $\mathcal{C}^{\mathcal{P}} = \{ C_1, \cdots , C_{m(\mathcal{P})} \}$ are sequences of point indices representing the solution associated to the point set $\mathcal{P}$ .

Models

Sequence-to-Sequence Model

This model learns the parameters $\theta$ of an encoder-decoder to maximize the conditional probabilitity of the output sequence on the training samples:
$\theta^* = \mathop{\arg\max}_{\theta} \sum_{\mathcal{P}, \mathcal{C}^{\mathcal{P}}} \log{p(\mathcal{C}^{\mathcal{P}} | \mathcal{P}; \theta)} ,$
where
$p(\mathcal{C}^{\mathcal{P}} | \mathcal{P}; \theta) = \prod_{i = 1}^{m(\mathcal{P})} p_\theta (C_i | C_1, \cdots , C_{i - 1}, \mathcal{P}; \theta) .$
Note that this model makes no statistical independence assumptions, thus it is not a Markov chain.

During inference, as the number of possible suquences grows exponentially with the sequence length, beam search is utilized to find the best possible sequence.

Notice that in the standard sequence-to-sequence model, the output dictionary size for all symbols $C_i$ is fixed and equal to $n$ , which means that the model trained for a particular sequence length can not generalize to other sequences with different length.

Content Based Input Attention

The vanilla sequence-to-sequence model only uses the final state of the encoder to represent the whole input sequence, which constrains the amount of information and computation that can flow through to the decoder. The attention model augments the decoder RNNs with an attention module over the encoder states to provide further information:
$u_j^i = v^T \tanh{W_1 e_j + W_2 d_i} ,$
$a^i = softmax(u^i) ,$
$d'_i = \sum_{i = 1}^n = a_j^i e_j ,$
where $e_j$ and $d_i$ are encoder state and decoder state respectively. $d'_i$ and $d_i$ are concatenated and used as the hidden state for prediction and input to the next time step.

Pointer Network

The pointer network simplifies the attention mechanism by normalizing the vector $u^i$ to be an output distribution over the dictionary of inputs, which guarantees that the dictionary size is always consistent with the input dictionary size:
$p(C_i | C_1, \cdots , C_{i - 1}, \mathcal{P}) = softmax(u^i) .$

Experiments

The paper experiments on three different problems, i.e. convex hull, Delaunay triangulation and TSP, all relating to finding a solution with respect to a discrete input sequence.

(The output is actually a cycle or set in these problems, which means that any point in the solution can be the start point in the decoder sequence. RNN actually can't reflect this property, and the authors had to artificially define a start point of the output sequence in the experimental setup. )

Convex Hull

Instance representation: The elements $C_i$ are indices between 1 and n corresponding to positions in the sequence $P$ . To represent the output as a sequence, start from the point with the lowest index, and go counter-clockwise.

Delaunay Triangulation

Instance representation: The outputs $\mathcal{C}^{\mathcal{P}} = \{ C_1, \cdots , C_{m(\mathcal{P})} \}$ are the corresponding sequences representing the triangulation of the point set $\mathcal{P}$ . Each $C_i$ is a triple of integers from 1 to n corresponding to the position of triangle vertices in $\mathcal{P}$ .

TSP

Instance representation: For consistency, in the training dataset, always start in the first city without loss of generality.

Generally speaking, the pointer network can work across different sequence length, and perform relatively well for small instance size. However, as it uses 1M training instances for each task, and all of them are uniformly sampled in an unit square, I doubt that it is fitting instead of actually learning.

Pointer Networks
很多时候使用seq2seq来解决序列生成问题，但seq2seq模型往往需要预先定义好输出集合的大小，即输出序列可能...
Pointer Networks
Pointer Networks Oriol Vinyals, Meire Fortunato, Navdeep ...
Pointer Networks
seq2seq 离散的词ID转换为词向量与Encoder 中的这个步骤是一样的，只不过embedding矩阵与...
#NIPS-2015# Pointer Networks
Pointer Networks 论文地址：https://arxiv.org/abs/1506.03134 因为...
指针网络-Pointer Networks
今天分享一个简单而又有趣的seq2seq模型-Pointer Networks。程序员看到Pointer可能会想到...
神经网络之Pointer Net (Ptr-net)
Pointer Networks 是发表在机器学习顶级会议NIPS 2015上的一篇文章，其作者分别来自Googl...
设置元素不可点击
cursor:pointer; pointer-events: none; pointer-events：auto...
iOS Tagged Pointer
Tagged Pointer 介绍苹果对于Tagged Pointer特点的介绍： Tagged Pointer...
go调用c全面解析
一、go的指针 pointer type、 uintptr和unsafe.Pointer pointer type...
Hash Pointer and Blockchain
hash pointer is : pointer to where some info is stored, a...