RNN模型写诗/歌/信

作者: poteman | 来源:发表于2019-08-08 16:38 被阅读0次

RNN模型写诗/歌/信
Lecture 10 循环神经网络
NLP in TensorFlow: 不同的神经网络模型
序列模型简介——RNN, Bidirectional RNN,
RNN LSTM语言模型 ——RNN
RNN
RNN模型笔记
双流rnn
使用Keras进行深度学习：（六）LSTM和双向LSTM讲解及实
用RNN进行mnist分类

导入所需的包

from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
import tensorflow.keras.utils as ku 
import numpy as np

下载并预处理数据

tokenizer = Tokenizer()
!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sonnets.txt \
    -O /tmp/sonnets.txt
data = open('/tmp/sonnets.txt').read()

corpus = data.lower().split("\n")

tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1

# create input sequences using list of tokens
input_sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

# pad sequences 
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

# create predictors and label
predictors, label = input_sequences[:,:-1],input_sequences[:,-1]

label = ku.to_categorical(label, num_classes=total_words)

定义模型

Embedding_dim = 100
LSTM_units_1 = 150
LSTM_units_2 = 100
Dense_units = total_words // 2
Dropout_rate = 0.2

model = Sequential()
model.add(Embedding(total_words, Embedding_dim, input_length=max_sequence_len-1))
model.add(LSTM(LSTM_units_1, return_sequences = True))
model.add(Dropout(Dropout_rate))
model.add(LSTM(LSTM_units_2))
model.add(Dense(Dense_units, kernel_regularizer=regularizers.l2(0.01), activation = 'relu'))
model.add(Dense(total_words, activation = 'softmax'))
# Pick an optimizer
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics=['accuracy'])
print(model.summary())

训练模型

 history = model.fit(predictors, label, epochs=100, verbose=1)

查看训练曲线

import matplotlib.pyplot as plt
acc = history.history['acc']
loss = history.history['loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.title('Training accuracy')

plt.figure()

plt.plot(epochs, loss, 'b', label='Training Loss')
plt.title('Training loss')
plt.legend()

plt.show()

用训练好的模型来写诗

seed_text = "Help me Obi Wan Kenobi, you're my only hope"
next_words = 100
  
for _ in range(next_words):
  token_list = tokenizer.texts_to_sequences([seed_text])[0]
  token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
  predicted = model.predict_classes(token_list, verbose=0)
  output_word = ""
  for word, index in tokenizer.word_index.items():
    if index == predicted:
      output_word = word
      break
  seed_text += " " + output_word
  
print(seed_text)

【参考文献】
1.google colab
2.TF官网: 在启用 Eager Execution 的情况下使用 RNN 生成文本