美文网首页
5. 可变序列使用pack_padded_sequence, p

5. 可变序列使用pack_padded_sequence, p

作者: yoyo9999 | 来源:发表于2021-04-14 15:18 被阅读0次

    3. pack_padded_sequence, pad_packed_sequence

    1. 处理可变序列

    当我们进行batch个训练数据一起计算的时候,我们会遇到多个训练样例长度不同的情况,这样我们就会很自然的进行padding,将短句子padding为跟最长的句子一样。

    比如如下句子:

    pack pad --> data

    但是,如Yes 后面的pad输入LSTM网络时,处理这些pad字符,会产生无效的hidden state 和 cell state使得结果变差。

    因此,我们要使得LSTM处理到Yes这样字符时,不再处理pad。

    LSTM pad

    我们使用pack_padded_sequence就能得到data一样pack pad的序列。

    2. torch.nn.utils.rnn.pack_padded_sequence()

    pack_padded_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)

    type: (PackedSequence, bool, float, Optional[int]) -> Tuple[Tensor, Tensor]

    作用:

    Pads a packed batch of variable length sequences.

    It is an inverse operation to pack_padded_sequence().

    The returned Tensor’s data will be of size T x B x *, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into B x T x * format.

    输入:

    • input (Variable) – 变长序列 被填充后的 batch

    • lengths (list[int]) – Variable 中 每个序列的长度。

    • batch_first (bool, optional) – 如果是True,input的形状应该是B*T*size

    返回值:

    一个PackedSequence 对象。

    ## 例子
    from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
    seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
    lens = [2, 1, 3] #每个sentence长度
    #如果batch_first=True,输入是[B, L, *]形状
    packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
    packed
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    PackedSequence(data=tensor([4, 1, 3, 5, 2, 6]), batch_sizes=tensor([3, 2, 1]),
    sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))
    
    

    3. torch.nn.utils.rnn.pad_packed_sequence()

    跟上面的 pack_padded_sequence是逆操作。
    输入: sequence, batch_first=False, padding_value=0.0, total_length=None

    输入一个填充packed_sequence,返回一个unpack的tensor, 和 len的tensor。其中:

    The returned Tensor’s data will be of size T x B x *, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into B x T x * format.

    将上面pad packed 序列[4, 1, 3, 5, 2, 6] 对象用pad_packed_sequence解包:

    seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first=True)
    seq_unpacked,lens_unpacked
    +++++++++++++++++++++++++++++++++++++++++
    (tensor([[1, 2, 0],
             [3, 0, 0],
             [4, 5, 6]]),
     tensor([2, 1, 3]))
    
    

    实际应用:

    # source_encodings: (batch_size, max_sequence_len, hidden_size * 2)
    # last_state, last_cell: List[(batch_size, hidden_size * 2)]
    bi_lstm = nn.LSTM(10, hidden_size=20, batch_first=True, bidirectional=True)
    source_encodings, (last_state, last_cell) = bi_lstm(packed_word_embeddings)
    source_encodings, _ = pad_packed_sequence(source_encodings, batch_first=True)
    

    这时返回的last_state, last_cell 都是不过pad部分的隐藏状态和隐藏cell;然后将source_encodings通过pad_packed_sequence解压后得到不pack的tensor, 如:

    tensor([[1, 2, 0], [3, 0, 0], [4, 5, 6]])

    后面的_, 是长度。

    Inference

    [1] Pytorch中的RNN之pack_padded_sequence和pad_packed_sequence

    [2] pytorch中如何处理RNN输入变长序列padding

    [3] pytorch对可变长度序列的处理

    相关文章

      网友评论

          本文标题:5. 可变序列使用pack_padded_sequence, p

          本文链接:https://www.haomeiwen.com/subject/qoaklltx.html