美文网首页
【Tool】Keras 基础学习 VI 二分类

【Tool】Keras 基础学习 VI 二分类

作者: ItchyHiker | 来源:发表于2018-09-25 17:32 被阅读0次

    先看一个基础的二分类问题。使用keras实现感知机算法。keras提供了一些官方数据集分别对于二分类,多分类,回归问题。其中IMDB评论数据集是二分类问题,Reuters数据集是多分类问题, house prices是回归问题。
    train_data是单词在评论中出现的下标, test_label是用户对电影的喜好,0: negative, 1: positive。
    分别查看下positive 和 negative 评论:

    (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
    word_index = imdb.get_word_index()
    reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
    good_review = ' '.join([reverse_word_index.get(i-3,'?') for i in train_data[0]])
    # 输出
    "? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"
    bad_review = ' '.join([reverse_word_index.get(i-3,'?') for i in train_data[0]])
    # 输出
    "? big hair big boobs bad music and a giant safety pin these are the words to best describe this terrible movie i love cheesy horror movies and i've seen hundreds but this had got to be on of the worst ever made the plot is paper thin and ridiculous the acting is an abomination the script is completely laughable the best is the end showdown with the cop and how he worked out who the killer is it's just so damn terribly written the clothes are sickening and funny in equal measures the hair is big lots of boobs bounce men wear those cut tee shirts that show off their stomachs sickening that men actually wore them and the music is just synthesiser trash that plays over and over again in almost every scene there is trashy music boobs and paramedics taking away bodies and the gym still doesn't close for bereavement all joking aside this is a truly bad film whose only charm is to look back on the disaster that was the 80's and have a good old laugh at how bad everything was back then"
    

    构建两层感知机算法, 进行分类,分类之前我们对数据进行预处理,进行one_hot encoding。 只考虑出现频率前1000的数据,在样本中出现为1, 不出现为0, 每个样本数据为1000 维向量。然后将处理后数据输入感知机算法当中。

    import os
    import numpy as np
    from keras.models import Sequential, Model
    from keras import layers
    from keras.preprocessing.image import ImageDataGenerator
    from keras import optimizers
    from keras.applications.vgg16 import VGG16
    from keras.utils.np_utils import to_categorical
    from scipy.misc import imread, imresize
    import matplotlib.pyplot as plt
    from keras.datasets import imdb
    
    (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
    word_index = imdb.get_word_index()
    reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
    decoded_review = ' '.join([reverse_word_index.get(i-3,'?') for i in train_data[0]])
    def vectorize_sequences(sequences, dimension=10000):
        results = np.zeros((len(sequences), dimension))
        for i, sequence in enumerate(sequences):
            results[i, sequence] = 1. # set specific indices of results[i] to 1s
        return results
    x_train = vectorize_sequences(train_data)
    x_test = vectorize_sequences(test_data)
    y_train = np.asarray(train_labels).astype('float32')
    y_test = np.asarray(test_labels).astype('float32')
    
    
    # define the model
    model = Sequential()
    model.add(layers.Dense(10, activation='relu', input_shape=(10000,)))
    model.add(layers.Dense(10, activation='relu'))
    model.add(layers.Dense(1, activation='sigmoid'))
    
    model.compile(optimizer='rmsprop', loss='mse', metrics=['accuracy'])
    
    x_val = x_train[:10000]
    partial_x_train = x_train[10000:]
    y_val = y_train[:10000]
    partial_y_train = y_train[10000:]
    
    history = model.fit(partial_x_train, partial_y_train, epochs=10, batch_size=512, validation_data=(x_val, y_val))
    
    metrics = model.evaluate(x_test, y_test)
    print(model.metrics_names)
    print(metrics)
    

    相关文章

      网友评论

          本文标题:【Tool】Keras 基础学习 VI 二分类

          本文链接:https://www.haomeiwen.com/subject/gzmloftx.html