【深度学习TensorFlow (12)】LSTM、卷积、GRU

作者: Geekero | 来源:发表于2021-02-28 11:47 被阅读0次

学习自中国大学MOOC TensorFlow学习课程

一、循环神经网络RNN的结构

神经网络是一个特殊的模型

通过数据和标签推导出规则，像函数：

但没有考虑输入数据之间的相互联系

所以当单词被分为子词时，上下文难以理解单子子词的含义。因此子词出现顺序，对理解单词含义非常重要

斐波拉契数列：

对应RNN网络：

单个循环神经网络的神经元：

多个神经元的组合：

更多RNN资料：

但这个模型有一个问题：就是没办法理解上下文含义,例如：

于是一种更先进的循环神经网络结构——LSTM被提出，用来分析文本的上下文含义：

除了前面RNN的标准的序列信息以外，还加入了cell state结构来实现长期记忆，来解决这个问题：

cell state 记忆可以是双向的，因为后文内容可能会影响到前文

更多资料：

一、LSTM网络

1.1 单层LTSM网络

Single Layer LSTM

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
import tensorflow_datasets as tfds

# Get the data
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)


# You can use a smaller version of the datasets to speed things up
# For example, here we use the first 10% of the training data
# and the first 10% of the test data to speed things up
# When I used 10%, I was able to train on a CPU at about 65 seconds per epoch
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'].take(4000), dataset['test'].take(1000)

构建分析器：

tokenizer = info.features['text'].encoder

# Can explore different buffer and batch sizes to make training
# faster also
BUFFER_SIZE = 1000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE)
test_dataset = test_dataset.padded_batch(BATCH_SIZE)

构建单层LSTM网络：

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size, 64), #嵌入层维度为64
    #Bidirectional记忆两个方向的上下文信息
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),  #增加第二层为LSTM层；64代表单向的LSTM层的输出维度；实际输出为128维
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary()


    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding (Embedding)        (None, None, 64)          523840    
    _________________________________________________________________
    bidirectional (Bidirectional (None, 128)               66048     
    _________________________________________________________________
    dense (Dense)                (None, 64)                8256      
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 65        
    =================================================================
    Total params: 598,209
    Trainable params: 598,209
    Non-trainable params: 0
    _________________________________________________________________

网络训练：

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Can change number of epochs to make training faster
NUM_EPOCHS = 50
history = model.fit(train_dataset, epochs=NUM_EPOCHS, validation_data=test_dataset)

    Epoch 1/50
    63/63 [==============================] - 22s 249ms/step - loss: 0.6926 - accuracy: 0.5080 - val_loss: 0.6739 - val_accuracy: 0.5970
    Epoch 2/50
     ...
    Epoch 50/50
    63/63 [==============================] - 14s 230ms/step - loss: 1.7393e-06 - accuracy: 1.0000 - val_loss: 2.8910 - val_accuracy: 0.7360

查看网络性能：

import matplotlib.pyplot as plt

def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.plot(history.history['val_'+string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.legend([string, 'val_'+string])
  plt.show()

plot_graphs(history, 'accuracy')

output_16_0.png

plot_graphs(history, 'loss')

可见单层LSTM收敛速度很快，但测试集的精确度提升不大和loss一直在上升

资源释放：

import os, signal

os.kill(os.getpid(), signal.SIGINT)

下面看看多层的LTSM的表现：

1.2 多层LTSM网络

Multiple Layer LSTM

from __future__ import absolute_import, division, print_function, unicode_literals


import tensorflow_datasets as tfds
import tensorflow as tf
print(tf.__version__)

    2.4.0

数据加载：

# Get the data
# dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
# train_dataset, test_dataset = dataset['train'], dataset['test']
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'].take(4000), dataset['test'].take(1000)

从数据集中导出分词器：

tokenizer = info.features['text'].encoder

print(info)

    tfds.core.DatasetInfo(
        name='imdb_reviews',
        full_name='imdb_reviews/subwords8k/1.0.0',
        description="""
        Large Movie Review Dataset.
        This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
        """,
        config_description="""
        Uses `tfds.deprecated.text.SubwordTextEncoder` with 8k vocab size
        """,
        homepage='http://ai.stanford.edu/~amaas/data/sentiment/',
        data_path='C:\\Users\\Robin\\tensorflow_datasets\\imdb_reviews\\subwords8k\\1.0.0',
        download_size=80.23 MiB,
        dataset_size=54.72 MiB,
        features=FeaturesDict({
            'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
            'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8185>),
        }),
        supervised_keys=('text', 'label'),
        splits={
            'test': <SplitInfo num_examples=25000, num_shards=1>,
            'train': <SplitInfo num_examples=25000, num_shards=1>,
            'unsupervised': <SplitInfo num_examples=50000, num_shards=1>,
        },
        citation="""@InProceedings{maas-EtAl:2011:ACL-HLT2011,
          author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
          title     = {Learning Word Vectors for Sentiment Analysis},
          booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
          month     = {June},
          year      = {2011},
          address   = {Portland, Oregon, USA},
          publisher = {Association for Computational Linguistics},
          pages     = {142--150},
          url       = {http://www.aclweb.org/anthology/P11-1015}
        }""",
    )

选择训练数据集和测试数据集

BUFFER_SIZE = 100
BATCH_SIZE = 100

train_dataset = train_dataset.shuffle(BUFFER_SIZE).take(1000)
train_dataset = train_dataset.padded_batch(BATCH_SIZE)
test_dataset = test_dataset.padded_batch(BATCH_SIZE).take(1000)

分词器字典大小：

tokenizer.vocab_size
    8185

构建网络：

vocab_size = 1000 #这行赋值代码好像没什么用
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size, 8),
    #当前一个LSTM层和后一个LSTM层衔接时，需要设置 return_sequences=True ，确保上一个LSTM层输出与下一个LSTM层输入相匹配
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(8, return_sequences=True)),  #实现多层的LSTM
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(8)),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])


model.summary()

    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding (Embedding)        (None, None, 8)           65480     
    _________________________________________________________________
    bidirectional (Bidirectional (None, None, 16)          1088      
    _________________________________________________________________
    bidirectional_1 (Bidirection (None, 16)                1600      
    _________________________________________________________________
    dense (Dense)                (None, 16)                272       
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 17        
    =================================================================
    Total params: 68,457
    Trainable params: 68,457
    Non-trainable params: 0
    _________________________________________________________________

模型编译和训练

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

NUM_EPOCHS = 50
history = model.fit(train_dataset, epochs=NUM_EPOCHS, validation_data=test_dataset)

    Epoch 1/50
    10/10 [==============================] - 16s 667ms/step - loss: 0.6932 - accuracy: 0.5180 - val_loss: 0.6932 - val_accuracy: 0.4970
    ...
    Epoch 50/50
    10/10 [==============================] - 4s 407ms/step - loss: 0.0011 - accuracy: 1.0000 - val_loss: 1.6064 - val_accuracy: 0.7030

查看网络性能：

import matplotlib.pyplot as plt


def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.plot(history.history['val_'+string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.legend([string, 'val_'+string])
  plt.show()

plot_graphs(history, 'accuracy')

output_13_0.png

plot_graphs(history, 'loss')

output_14_0.png

课程中介绍说：

双层LSTM的训练准确度曲线更平缓，同时验证准确度曲线更好

单层LSTM网络模型的训练准确度虽然总体在上升，但是在某些地方出现急剧的下降，说明算法的鲁棒性不高；

双层LSTM网络模型的训练准确度曲线则非常平滑，说明训练的过程更加稳定

但是我这里测试实际上是反过来了。。。可能与数据有关。。。

还有不同的循环神经网络：

含卷积层的RNN
门控循环单元GRU

文本分类更难的原因：

因为网络结构非常简单，容易出现过拟合，所以可以通过调整网络的结构和参数，来改善网络的性能
相较于图像处理，过拟合更容易发生在文本处理中，因为验证数据集中总是存在未登录词，而这些词难以分类，因此导致过拟合

import os, signal

os.kill(os.getpid(), signal.SIGINT)

二、卷积网络

import json
import tensorflow as tf
import numpy as np

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

数据下载

# !wget --no-check-certificate \
#     https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sarcasm.json \
#     -O /tmp/sarcasm.json

设置超参数

vocab_size = 1000
embedding_dim = 16 #嵌入维度为16
max_length = 120
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"
training_size = 20000

数据预处理，构建训练和测试数据集：

with open("sarcasm.json", 'r') as f:
    datastore = json.load(f)


sentences = []
labels = []
urls = []
for item in datastore:
    sentences.append(item['headline'])
    labels.append(item['is_sarcastic'])

training_sentences = sentences[0:training_size]
testing_sentences = sentences[training_size:]
training_labels = labels[0:training_size]
testing_labels = labels[training_size:]

构建分析器并且将文本序列化：

tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)

word_index = tokenizer.word_index

training_sequences = tokenizer.texts_to_sequences(training_sentences)
training_padded = pad_sequences(training_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

构建文本嵌入卷积网络：

通过卷积层，输入文本向量将通过128个大小为5的卷积核来提取特征，并通过学习来调整卷积核的参数，以获得期望的结果

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    #通过卷积层，输入文本向量将通过大小为5的卷积核来提取特征，并通过学习来调整卷积核的参数，以获得期望的结果
    tf.keras.layers.Conv1D(128, 5, activation='relu'),  #卷积核数目为128，大小为5，激活函数为relu
    tf.keras.layers.GlobalMaxPooling1D(),
    tf.keras.layers.Dense(24, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

因为输入的是包含120个单词的序列，一个5个单词长的卷积核，会从序列的前面核后面各削去2个单词，只剩下116个单词

#因为输入的是包含120个单词的序列，一个5个单词长的卷积核，会从序列的前面核后面各削去2个单词，只剩下116个单词
model.summary() 

    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding (Embedding)        (None, 120, 16)           16000     
    _________________________________________________________________
    conv1d (Conv1D)              (None, 116, 128)          10368     
    _________________________________________________________________
    global_max_pooling1d (Global (None, 128)               0         
    _________________________________________________________________
    dense (Dense)                (None, 24)                3096      
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 25        
    =================================================================
    Total params: 29,489
    Trainable params: 29,489
    Non-trainable params: 0
    _________________________________________________________________

模型训练50轮：

num_epochs = 50
training_padded = np.array(training_padded)
training_labels = np.array(training_labels)
testing_padded = np.array(testing_padded)
testing_labels = np.array(testing_labels)

history = model.fit(training_padded, training_labels, epochs=num_epochs, validation_data=(testing_padded, testing_labels), verbose=1)

    Epoch 1/50
    625/625 [==============================] - 10s 10ms/step - loss: 0.5596 - accuracy: 0.6902 - val_loss: 0.4028 - val_accuracy: 0.8137
    ...
    Epoch 50/50
    625/625 [==============================] - 6s 9ms/step - loss: 0.0217 - accuracy: 0.9907 - val_loss: 2.6126 - val_accuracy: 0.7757

查看模型性能

import matplotlib.pyplot as plt


def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.plot(history.history['val_'+string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.legend([string, 'val_'+string])
  plt.show()

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

output_8_0.png

output_8_1.png

性能好像也不是很好

保存模型以便后面可用：

model.save("test.h5")

资源释放：

import os, signal

os.kill(os.getpid(), signal.SIGINT)

三、GRU网络

# Multiple Layer GRU
from __future__ import absolute_import, division, print_function, unicode_literals


import tensorflow_datasets as tfds
import tensorflow as tf
print(tf.__version__)

    2.4.0

TF 2.x不需要运行以下语句：

# If the tf.__version__ is 1.x, please run this cell
# !pip install tensorflow==2.0.0-beta0

数据获取：

# Get the data
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

从数据集中导出已经构建好的分词器：

tokenizer = info.features['text'].encoder

设置训练和测试数据集

BUFFER_SIZE = 10000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(train_dataset))
test_dataset = test_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(test_dataset))

构建GRU模型

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size, 64), #嵌入的维度为64维
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(128)),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

查看网络结构

model.summary()

    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding (Embedding)        (None, None, 64)          523840    
    _________________________________________________________________
    bidirectional (Bidirectional (None, 256)               148992    
    _________________________________________________________________
    dense (Dense)                (None, 6)                 1542      
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 7         
    =================================================================
    Total params: 674,381
    Trainable params: 674,381
    Non-trainable params: 0
    _________________________________________________________________

编译并训练模型：

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

NUM_EPOCHS = 50
history = model.fit(train_dataset, epochs=NUM_EPOCHS, validation_data=test_dataset)

    Epoch 1/50
    391/391 [==============================] - 166s 392ms/step - loss: 0.6686 - accuracy: 0.5515 - val_loss: 0.7990 - val_accuracy: 0.5909
     ...
    Epoch 50/50
    391/391 [==============================] - 145s 371ms/step - loss: 1.2944e-06 - accuracy: 1.0000 - val_loss: 1.5420 - val_accuracy: 0.8560

查看模型性能：

import matplotlib.pyplot as plt


def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.plot(history.history['val_'+string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.legend([string, 'val_'+string])
  plt.show()

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

output_12_0.png

output_13_0.png

资源释放

import os, signal

os.kill(os.getpid(), signal.SIGINT)

四、多种模型的结合

import json
import tensorflow as tf
import csv
import random
import numpy as np

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import regularizers


embedding_dim = 100
max_length = 16
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"
training_size=160000
test_portion=.1

corpus = []


(省略)



model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=max_length, weights=[embeddings_matrix], trainable=False),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(pool_size=4),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

num_epochs = 50
training_sequences = np.array(training_sequences)
training_labels = np.array(training_labels)
test_sequences = np.array(test_sequences)
test_labels = np.array(test_labels)
history = model.fit(training_sequences, training_labels, epochs=num_epochs, validation_data=(test_sequences, test_labels), verbose=2)

print("Training Complete")

五、多种模型的性能比较

# NOTE: PLEASE MAKE SURE YOU ARE RUNNING THIS IN A PYTHON3 ENVIRONMENT

import tensorflow as tf
print(tf.__version__)

# This is needed for the iterator over the data
# But not necessary if you have TF 2.0 installed
#!pip install tensorflow==2.0.0-beta0


# tf.enable_eager_execution()

# !pip install -q tensorflow-datasets

    2.4.0

加载数据集

import tensorflow_datasets as tfds
imdb, info = tfds.load("imdb_reviews", with_info=True, as_supervised=True)

准备训练数据和测试数据

import numpy as np

# train_data, test_data = imdb['train'], imdb['test']
train_data, test_data = imdb['train'].take(4000), imdb['test'].take(1000)

training_sentences = []
training_labels = []

testing_sentences = []
testing_labels = []

# str(s.tonumpy()) is needed in Python3 instead of just s.numpy()
for s,l in train_data:
  training_sentences.append(str(s.numpy()))
  training_labels.append(l.numpy())
  
for s,l in test_data:
  testing_sentences.append(str(s.numpy()))
  testing_labels.append(l.numpy())
  
training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)

超参数设定和文本序列化

vocab_size = 10000
embedding_dim = 16
max_length = 120
trunc_type='post'
oov_tok = "<OOV>"


from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(training_sentences)
padded = pad_sequences(sequences,maxlen=max_length, truncating=trunc_type)

testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences,maxlen=max_length)

构建解码器查看输出编码转文本内容

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

print(decode_review(padded[1]))
print(training_sentences[1])

    ? ? ? ? ? ? ? b'i have been known to fall asleep during films but this is usually due to a combination of things including really tired being warm and comfortable on the <OOV> and having just eaten a lot however on this occasion i fell asleep because the film was rubbish the plot development was constant constantly slow and boring things seemed to happen but with no explanation of what was causing them or why i admit i may have missed part of the film but i watched the majority of it and everything just seemed to happen of its own <OOV> without any real concern for anything else i cant recommend this film at all '
    b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.'

构建单层GRU模型并查看模型结构

# Model Definition with GRU
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding (Embedding)        (None, 120, 16)           160000    
    _________________________________________________________________
    bidirectional (Bidirectional (None, 64)                9600      
    _________________________________________________________________
    dense (Dense)                (None, 6)                 390       
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 7         
    =================================================================
    Total params: 169,997
    Trainable params: 169,997
    Non-trainable params: 0
    _________________________________________________________________

模型训练：

num_epochs = 50
history = model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))


    Epoch 1/50
    125/125 [==============================] - 16s 53ms/step - loss: 0.6933 - accuracy: 0.4849 - val_loss: 0.6931 - val_accuracy: 0.4970
   ...
    Epoch 50/50
    125/125 [==============================] - 4s 28ms/step - loss: 3.5880e-06 - accuracy: 1.0000 - val_loss: 1.9096 - val_accuracy: 0.7550

模型训练性能

import matplotlib.pyplot as plt


def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.plot(history.history['val_'+string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.legend([string, 'val_'+string])
  plt.show()

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

output_7_0.png

output_7_1.png

模型收敛速度很快，测试训练精确度不能再往上升，loss随轮数增加，模型过拟合。

构建单层LSTM模型

# Model Definition with single LSTM
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

    Model: "sequential_1"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding_1 (Embedding)      (None, 120, 16)           160000    
    _________________________________________________________________
    bidirectional_1 (Bidirection (None, 64)                12544     
    _________________________________________________________________
    dense_2 (Dense)              (None, 6)                 390       
    _________________________________________________________________
    dense_3 (Dense)              (None, 1)                 7         
    =================================================================
    Total params: 172,941
    Trainable params: 172,941
    Non-trainable params: 0
    _________________________________________________________________

模型训练

num_epochs = 50
history = model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))

    Epoch 1/50
    125/125 [==============================] - 14s 49ms/step - loss: 0.6927 - accuracy: 0.5054 - val_loss: 0.6128 - val_accuracy: 0.7090
   ...
    Epoch 50/50
    125/125 [==============================] - 3s 27ms/step - loss: 4.3843e-05 - accuracy: 1.0000 - val_loss: 1.8278 - val_accuracy: 0.7590

模型性能

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

output_10_0.png

output_10_1.png

好像模型鲁棒性更加差了。。。。

构建多层LSTM模型

# Model Definition with multiple LSTM
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

    Model: "sequential_4"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding_7 (Embedding)      (None, 120, 16)           160000    
    _________________________________________________________________
    bidirectional_2 (Bidirection (None, 120, 64)           12544     
    _________________________________________________________________
    bidirectional_3 (Bidirection (None, 64)                24832     
    _________________________________________________________________
    dense_8 (Dense)              (None, 6)                 390       
    _________________________________________________________________
    dense_9 (Dense)              (None, 1)                 7         
    =================================================================
    Total params: 197,773
    Trainable params: 197,773
    Non-trainable params: 0
    _________________________________________________________________

模型训练

num_epochs = 50
history = model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))

Epoch 1/50
125/125 [==============================] - 24s 81ms/step - loss: 0.6917 - accuracy: 0.5152 - val_loss: 0.6868 - val_accuracy: 0.5030
...
Epoch 50/50
125/125 [==============================] - 5s 43ms/step - loss: 3.4054e-06 - accuracy: 1.0000 - val_loss: 2.3198 - val_accuracy: 0.7740

 模型性能：
```python
plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

output_13_0.png

output_13_1.png

构建单层卷积模型

# Model Definition with Conv1D
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.Conv1D(128, 5, activation='relu'),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()


    Model: "sequential_3"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding_3 (Embedding)      (None, 120, 16)           160000    
    _________________________________________________________________
    conv1d_1 (Conv1D)            (None, 116, 128)          10368     
    _________________________________________________________________
    global_average_pooling1d_1 ( (None, 128)               0         
    _________________________________________________________________
    dense_6 (Dense)              (None, 6)                 774       
    _________________________________________________________________
    dense_7 (Dense)              (None, 1)                 7         
    =================================================================
    Total params: 171,149
    Trainable params: 171,149
    Non-trainable params: 0
    _________________________________________________________________

模型训练

num_epochs = 50
history = model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))

    Epoch 1/50
    125/125 [==============================] - 6s 19ms/step - loss: 0.6895 - accuracy: 0.5145 - val_loss: 0.6109 - val_accuracy: 0.7300
   ...
    Epoch 50/50
    125/125 [==============================] - 1s 11ms/step - loss: 1.5977e-05 - accuracy: 1.0000 - val_loss: 1.4861 - val_accuracy: 0.7840

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

output_16_0.png

output_16_1.png

资源释放：

import os, signal

os.kill(os.getpid(), signal.SIGINT)

【深度学习TensorFlow (12)】LSTM、卷积、GRU

一、循环神经网络RNN的结构

一、LSTM网络

1.1 单层LTSM网络

Single Layer LSTM

1.2 多层LTSM网络

Multiple Layer LSTM

文本分类更难的原因：

二、卷积网络

三、GRU网络

四、多种模型的结合

五、多种模型的性能比较

构建单层GRU模型并查看模型结构

构建单层LSTM模型

构建多层LSTM模型

构建单层卷积模型

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读