美文网首页
【深度学习Tensrflow(13)】用循环神经网络实现文本生成

【深度学习Tensrflow(13)】用循环神经网络实现文本生成

作者: Geekero | 来源:发表于2021-03-01 21:27 被阅读0次

    学习自中国大学MOOC TensorFlow学习课程

    原理

    文本生成其实就是文本预测,当我们有足够大的语料库时,神经网络能对其中的短语进行训练,从而预测下一个单词,这样就能产生一系列复杂的文本

    案例一: 文本预测_歌词

    导入库

    import tensorflow as tf
    
    from tensorflow.keras.preprocessing.sequence import pad_sequences
    from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional
    from tensorflow.keras.preprocessing.text import Tokenizer
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.optimizers import Adam
    import numpy as np 
    import time
    

    构建分词器和词典

    #print(time.time())
    tokenizer = Tokenizer()
    
    #将整首个放入data
    data="In the town of Athy one Jeremy Lanigan \n Battered away til he hadnt a pound. \nHis father died and made him a man again \n Left him a farm and ten acres of ground. \nHe gave a grand party for friends and relations \nWho didnt forget him when come to the wall, \nAnd if youll but listen Ill make your eyes glisten \nOf the rows and the ructions of Lanigans Ball. \nMyself to be sure got free invitation, \nFor all the nice girls and boys I might ask, \nAnd just in a minute both friends and relations \nWere dancing round merry as bees round a cask. \nJudy ODaly, that nice little milliner, \nShe tipped me a wink for to give her a call, \nAnd I soon arrived with Peggy McGilligan \nJust in time for Lanigans Ball. \nThere were lashings of punch and wine for the ladies, \nPotatoes and cakes; there was bacon and tea, \nThere were the Nolans, Dolans, OGradys \nCourting the girls and dancing away. \nSongs they went round as plenty as water, \nThe harp that once sounded in Taras old hall,\nSweet Nelly Gray and The Rat Catchers Daughter,\nAll singing together at Lanigans Ball. \nThey were doing all kinds of nonsensical polkas \nAll round the room in a whirligig. \nJulia and I, we banished their nonsense \nAnd tipped them the twist of a reel and a jig. \nAch mavrone, how the girls got all mad at me \nDanced til youd think the ceiling would fall. \nFor I spent three weeks at Brooks Academy \nLearning new steps for Lanigans Ball. \nThree long weeks I spent up in Dublin, \nThree long weeks to learn nothing at all,\n Three long weeks I spent up in Dublin, \nLearning new steps for Lanigans Ball. \nShe stepped out and I stepped in again, \nI stepped out and she stepped in again, \nShe stepped out and I stepped in again, \nLearning new steps for Lanigans Ball. \nBoys were all merry and the girls they were hearty \nAnd danced all around in couples and groups, \nTil an accident happened, young Terrance McCarthy \nPut his right leg through miss Finnertys hoops. \nPoor creature fainted and cried Meelia murther, \nCalled for her brothers and gathered them all. \nCarmody swore that hed go no further \nTil he had satisfaction at Lanigans Ball. \nIn the midst of the row miss Kerrigan fainted, \nHer cheeks at the same time as red as a rose. \nSome of the lads declared she was painted, \nShe took a small drop too much, I suppose. \nHer sweetheart, Ned Morgan, so powerful and able, \nWhen he saw his fair colleen stretched out by the wall, \nTore the left leg from under the table \nAnd smashed all the Chaneys at Lanigans Ball. \nBoys, oh boys, twas then there were runctions. \nMyself got a lick from big Phelim McHugh. \nI soon replied to his introduction \nAnd kicked up a terrible hullabaloo. \nOld Casey, the piper, was near being strangled. \nThey squeezed up his pipes, bellows, chanters and all. \nThe girls, in their ribbons, they got all entangled \nAnd that put an end to Lanigans Ball."
    #转换小写后,根据换行符切分成列表
    corpus = data.lower().split("\n") 
    #构建词典
    tokenizer.fit_on_texts(corpus) #key为单词,value为单词编码
    

    统计词典中单词数量,考虑到未登录词,设定语料库中词汇的总量为单词的数量加一

    total_words = len(tokenizer.word_index) + 1
    
    print(tokenizer.word_index)
    print(total_words)
    
    {'and': 1, 'the': 2, 'a': 3, 'in': 4, 'all': 5, 'i': 6, 'for': 7, 'of': 8, 'lanigans': 9, 'ball': 10, 'were': 11, 'at': 12, 'to': 13, 'she': 14, 'stepped': 15, 'his': 16, 'girls': 17, 'as': 18, 'they ...}
    

    将语料库转换成训练数据

    input_sequences = []
    for line in corpus:
        #将语料库的每一行进行序列化,成为数字编码
        token_list = tokenizer.texts_to_sequences([line])[0]
        #通过迭代对每个句子产生不同长度的序列
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)  
    

    计算语料库中最长句子的长度

    # pad sequences 
    max_sequence_len = max([len(x) for x in input_sequences])
    #根据最长句子长度填充所有序列,句子长度相同,句子前面填充0,这样更容易提取句子对应的标签
    input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
    

    查看构建好的输入序列结构:

    input_sequences
    
    array([[  0,   0,   0, ...,   0,   4,   2],
           [  0,   0,   0, ...,   4,   2,  66],
           [  0,   0,   0, ...,   2,  66,   8],
           ...,
           [  0,   0,   0, ...,  60, 262,  13],
           [  0,   0,   0, ..., 262,  13,   9],
           [  0,   0,   0, ...,  13,   9,  10]])
    

    将序列转换为神经网络的输入X和标签Y。

    可以将除最后一个字符以外的所有字符作为输入X,最后一个字符座位标签Y

    # create predictors and label
    xs, labels = input_sequences[:,:-1],input_sequences[:,-1]
    

    然后对标签进行独热编码,因为我们将单词的预测问题转换成了分类问题,

    需要根据给定的单词序列,来预测下一个单词

    而每一个单词对应语料库中的一个类别,所以使用keras.utils.to_categorical来实现独热编码

    ys = tf.keras.utils.to_categorical(labels, num_classes=total_words)
    

    Y是一个独热编码的数组,大小为语料库中的大小;

    在Y中,将该标签所在的位置设置为1,在本例中第70个元素设置为1


    打印词典看看:

    print(tokenizer.word_index)
    
    {'and': 1, 'the': 2, 'a': 3, 'in': 4, 'all': 5, 'i': 6, 'for': 7, 'of': 8, 'lanigans': 9, 'ball': 10, 'were': 11, 'at': 12, 'to': 13, 'she': 14, 'stepped': 15, 'his': 16, 'girls': 17, 'as': 18, 'they',...}
    

    模型1:构建一个单向的LSTM网络:

    输入序列的长度为最大序列长度减1,这是因为序列的最后一个单词是标签

    model1 = Sequential()
    #嵌入层需要处理的单词数量为语料库中所有的单词;嵌入的维度为64;
    #输入序列的长度为最大序列长度减1,这是因为序列的最后一个单词是标签
    model1.add(Embedding(total_words, 64, input_length=max_sequence_len-1))
    #单向的LSTM层,LSTM的参数,代表它的Cell State所处理的上下文(单词)长度,这里设置为20个单词
    model1.add(LSTM(20))
    #全连接层。其节点的数目,为语料库中所有单词的长度
    #这是因为我们使用了独热编码,因此每个单词对应了一个神经元节点
    #当预测某一个单词时,该单词对应的神经元就会被激活
    model1.add(Dense(total_words, activation='softmax'))
    

    模型训练

    损失函数为分类交叉熵,优化器为adam

    model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    #训练周期为500,这是因为这个模型的数据比较少,需要较长的周期才能收敛
    history1 = model1.fit(xs, ys, epochs=500, verbose=1)
    
    
    Epoch 498/500
    15/15 [==============================] - 0s 8ms/step - loss: 0.2002 - accuracy: 0.9555
    Epoch 499/500
    15/15 [==============================] - 0s 8ms/step - loss: 0.2258 - accuracy: 0.9433
    Epoch 500/500
    15/15 [==============================] - 0s 8ms/step - loss: 0.2203 - accuracy: 0.9439
    

    性能可视化:

    import matplotlib.pyplot as plt
    
    def plot_graphs(history, string):
      plt.plot(history.history[string])
      plt.xlabel("Epochs")
      plt.ylabel(string)
      plt.show()
    
    plot_graphs(history1, 'accuracy')
    

    文本预测:

    seed_text = "Laurence went to dublin"
    next_words = 100
      
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = model1.predict_classes(token_list, verbose=0)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    print(seed_text) 
    #print(time.time())
    
    Laurence went to dublin the town of athy jeremy lanigan lanigan think lanigan ogradys brothers them her them and the ructions of was painted wine friends relations and acres relations lanigans ball cask the wall lanigan call call dublin lanigans ball ball ball brothers ball the girls they a grand eyes glisten away accident happened happened your your eyes eyes eyes mccarthy away further glisten glisten glisten glisten glisten glisten brothers out and groups a call a reel as plenty away bellows as water water water water all cask the girls bellows brothers a call ask call call them the of satisfaction at brothers
    

    模型2:双向LSTM网络

    model2 = Sequential()
    #嵌入层需要处理的单词数量为语料库中所有的单词;嵌入的维度为64;
    #输入序列的长度为最大序列长度减1,这是因为序列的最后一个单词是标签
    model2.add(Embedding(total_words, 64, input_length=max_sequence_len-1))
    #双向的LSTM网络
    model2.add(Bidirectional(LSTM(20)))
    model2.add(Dense(total_words, activation='softmax'))
    model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    history2 = model2.fit(xs, ys, epochs=500, verbose=1)
    
    15/15 [==============================] - 0s 14ms/step - loss: 0.1525 - accuracy: 0.9490
    Epoch 499/500
    15/15 [==============================] - 0s 14ms/step - loss: 0.1274 - accuracy: 0.9537
    Epoch 500/500
    15/15 [==============================] - 0s 14ms/step - loss: 0.1376 - accuracy: 0.9486
    

    查看性能

    plot_graphs(history2, 'accuracy')
    

    文本预测:

    seed_text = "Laurence went to dublin"
    next_words = 100
      
    for _ in range(next_words):
        #先将这句话转换成序列
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        #填充为与训练集中的句子长度一致
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        #预测出当前句子最有可能的下一个单词的索引
        predicted = model2.predict_classes(token_list, verbose=0)
        
        #接着,对输出的单词索引进行反向查询
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    print(seed_text) 
    #print(time.time())
    
    Laurence went to dublin round merry as bees round a cask fall right call call ground creature happened lanigan youll lanigan glisten drop drop drop drop drop creature drop stepped out and i stepped in again again again to might ground drop drop drop creature hullabaloo drop drop drop pound too too too much stepped in again i suppose mad out out by the wall and man of of ceiling milliner was hall youll lanigan drop drop youll lanigan creature drop youll lanigan creature drop but lanigan creature hullabaloo glisten glisten creature drop drop drop drop ask ask drop too too too much i
    

    由于概率的错误累计,当预测的单词越多,得到的结果越差

    资源释放

    import os, signal
    
    os.kill(os.getpid(), signal.SIGINT)
    

    案例2:生成诗歌

    其实就是上述案例扩展,因为数据集变大,需要调整一些参数

    import tensorflow as tf
    
    from tensorflow.keras.preprocessing.sequence import pad_sequences
    from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional
    from tensorflow.keras.preprocessing.text import Tokenizer
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.optimizers import Adam
    import numpy as np 
    import time
    
    
    print(time.time())
    # !wget --no-check-certificate \
    #     https://storage.googleapis.com/laurencemoroney-blog.appspot.com/irish-lyrics-eof.txt \
    #     -O irish-lyrics-eof.txt
    

    读入数据,分词器和词典构建:

    tokenizer = Tokenizer()
    
    data = open('irish-lyrics-eof.txt').read()
    
    corpus = data.lower().split("\n")
    
    tokenizer.fit_on_texts(corpus)
    total_words = len(tokenizer.word_index) + 1
    
    print(tokenizer.word_index)
    print(total_words)
    
    {'the': 1, 'and': 2, 'i': 3, 'to': 4, 'a': 5, 'of': 6, 'my': 7, 'in': 8, 'me': 9, 'for': 10, 'you': 11, 'all': 12, 'was': 13, 'she': 14, 'that': 15, 'on': 16, 'with': 17, 'her': 18...}
    

    将序列转换为神经网络的输入X和标签Y。

    可以将除最后一个字符以外的所有字符作为输入X,最后一个字符座位标签Y

    input_sequences = []
    for line in corpus:
        token_list = tokenizer.texts_to_sequences([line])[0]
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)
    
    # pad sequences 
    max_sequence_len = max([len(x) for x in input_sequences])
    input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
    
    # create predictors and label
    xs, labels = input_sequences[:,:-1],input_sequences[:,-1]
    
    ys = tf.keras.utils.to_categorical(labels, num_classes=total_words)
    

    例如我们查看其中一个序列:

    print(tokenizer.word_index['in'])
    print(tokenizer.word_index['the'])
    print(tokenizer.word_index['town'])
    print(tokenizer.word_index['of'])
    print(tokenizer.word_index['athy'])
    print(tokenizer.word_index['one'])
    print(tokenizer.word_index['jeremy'])
    print(tokenizer.word_index['lanigan'])
    
    
    4
    2
    66
    8
    67
    68
    69
    70
    

    是以下这个序列Xs

    print(xs[6])
    
    [ 0  0  0  4  2 66  8 67 68 69]
    

    其对应的标Ys为:

    print(ys[6])
    
    [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    

    同样,上一个序列的标签的独热编码位置在上一条序列标签位置前方

    print(xs[5])
    print(ys[5]
    
    [ 0  0  0  0  4  2 66  8 67 68]
    [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
     0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    

    看看词典结构:

    print(tokenizer.word_index)
    
    {'the': 1, 'and': 2, 'i': 3, 'to': 4, 'a': 5, 'of': 6, 'my': 7, 'in': 8, 'me': 9, 'for': 10, 'you': 11, 'all': 12, 'was': 13, 'she': 14, 'that': 15, 'on': 16, 'with': 17, 'her': 18, 'but': 19, 'as': 20, 'when'...}
    

    模型构建

    model = Sequential()
    model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
    model.add(Bidirectional(LSTM(150)))
    model.add(Dense(total_words, activation='softmax'))
    

    设置优化器的学习率为0.01, 编译模型

    adam = Adam(lr=0.01)
    
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    #earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
    

    模型训练

    history = model.fit(xs, ys, epochs=100, verbose=1)
    #print model.summary()
    print(model)
    
    Epoch 99/100
    377/377 [==============================] - 5s 14ms/step - loss: 0.7221 - accuracy: 0.8028
    Epoch 100/100
    377/377 [==============================] - 5s 15ms/step - loss: 0.7731 - accuracy: 0.7944
    <tensorflow.python.keras.engine.sequential.Sequential object at 0x0000016CC9F3C1F0>
    

    性能:

    import matplotlib.pyplot as plt
    
    
    def plot_graphs(history, string):
      plt.plot(history.history[string])
      plt.xlabel("Epochs")
      plt.ylabel(string)
      plt.show()
    
    plot_graphs(history, 'accuracy')
    

    文本生成

    seed_text = "I've got a bad feeling about this"
    next_words = 100
      
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = model.predict_classes(token_list, verbose=0)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                iutput_word = word
                break
        seed_text += " " + output_word
        
    print(seed_text)
    print(time.time())
    

    文本预测

    I've got a bad feeling about this young maid in my heart i heart until you father tree love judge you expressed me deny white fellows when your smile for all on all away cross that love again love the love to your pure wild water wrath my soul i want in the morning ireland so sweet is five might break still me embarrass your who cail铆n deas cr煤ite na mb贸 many shut i would say them out of joy was dead and wild to another i go down was gone and heard your smile is a part above yonder gleam was sailing to me and the
    1614570914.6926315
    

    标点字符预测



    资源释放

    import os, signal
    
    os.kill(os.getpid(), signal.SIGINT)
    

    案例三 莎士比亚十四行诗

    from tensorflow.keras.preprocessing.sequence import pad_sequences
    from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
    from tensorflow.keras.preprocessing.text import Tokenizer
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.optimizers import Adam
    from tensorflow.keras import regularizers
    import tensorflow.keras.utils as ku 
    import numpy as np 
    
    tokenizer = Tokenizer()
    # !wget --no-check-certificate \
    #     https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sonnets.txt \
    #     -O sonnets.txt
    
    data = open('sonnets.txt').read()
    corpus = data.lower().split("\n")
    
    tokenizer.fit_on_texts(corpus)
    total_words = len(tokenizer.word_index) + 1
    
    # create input sequences using list of tokens
    input_sequences = []
    for line in corpus:
        token_list = tokenizer.texts_to_sequences([line])[0]
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)
    
    # pad sequences 
    max_sequence_len = max([len(x) for x in input_sequences])
    input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
    
    # create predictors and label
    predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
    label = ku.to_categorical(label, num_classes=total_words)
    
    model = Sequential()
    model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
    model.add(Bidirectional(LSTM(150, return_sequences = True)))
    model.add(Dropout(0.2))
    model.add(LSTM(100))
    model.add(Dense(total_words/2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
    model.add(Dense(total_words, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    print(model.summary())
    
    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding (Embedding)        (None, 10, 100)           321100    
    _________________________________________________________________
    bidirectional (Bidirectional (None, 10, 300)           301200    
    _________________________________________________________________
    dropout (Dropout)            (None, 10, 300)           0         
    _________________________________________________________________
    lstm_1 (LSTM)                (None, 100)               160400    
    _________________________________________________________________
    dense (Dense)                (None, 1605)              162105    
    _________________________________________________________________
    dense_1 (Dense)              (None, 3211)              5156866   
    =================================================================
    Total params: 6,101,671
    Trainable params: 6,101,671
    Non-trainable params: 0
    _________________________________________________________________
    None
    
    history = model.fit(predictors, label, epochs=100, verbose=1)
    
    Epoch 99/100
    484/484 [==============================] - 11s 22ms/step - loss: 0.9592 - accuracy: 0.82290s - l
    Epoch 100/100
    484/484 [==============================] - 11s 22ms/step - loss: 0.9533 - accuracy: 0.8250
    
    import matplotlib.pyplot as plt
    acc = history.history['accuracy']
    loss = history.history['loss']
    
    epochs = range(len(acc))
    
    plt.plot(epochs, acc, 'b', label='Training accuracy')
    plt.title('Training accuracy')
    
    plt.figure()
    
    plt.plot(epochs, loss, 'b', label='Training Loss')
    plt.title('Training loss')
    plt.legend()
    
    plt.show()
    

    seed_text = "Help me Obi Wan Kenobi, you're my only hope"
    next_words = 100
      
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = model.predict_classes(token_list, verbose=0)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
                
        seed_text += " " + output_word
    print(seed_text)
    

    Help me Obi Wan Kenobi, you're my only hope away me was her sight behold ill hate of 'no ' go doth tend cross thee blind ' must did grew her date and therein weak in care remain remain twain light bred bright ride bow did part was old old part was despair ' so well light light taken light gone bright ' live on now bold worth doth thee ill might see care true mind defeated true pride aside aside speed alive thee in that too made so ' me prove so cheeks ' must call thee art but behold ill bright still did store live in worth

    相关文章

      网友评论

          本文标题:【深度学习Tensrflow(13)】用循环神经网络实现文本生成

          本文链接:https://www.haomeiwen.com/subject/imggfltx.html