美文网首页
kashgari学习笔记-1

kashgari学习笔记-1

作者: Andy9918 | 来源:发表于2020-05-20 15:26 被阅读0次

    1、回调函数的使用

    from kashgari.corpus import SMP2018ECDTCorpus
    import keras
    import kashgari
    from kashgari.tasks.classification import BiLSTM_Model
    from kashgari.callbacks import EvalCallBack
    
    import logging
    logging.basicConfig(level='DEBUG')
    
    # 加载内置数据集
    train_x, train_y = SMP2018ECDTCorpus.load_data('train')
    valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
    test_x, test_y = SMP2018ECDTCorpus.load_data('test')
    
    # # 也可以使用自己的数据集,格式如下:
    # train_x = [['Hello', 'world'], ['Hello', 'Kashgari']]
    # train_y = ['a', 'b']
    
    valid_x, valid_y = train_x, train_y
    test_x, test_y = train_x, train_y
    
    tf_board_callback = keras.callbacks.TensorBoard(log_dir='./logs', update_freq=1000)
    model = BiLSTM_Model()
    
    # 这是 Kashgari 内置回调函数,会在训练过程计算精确度,召回率和 F1
    eval_callback = EvalCallBack(kash_model=model,
                                 valid_x=valid_x,
                                 valid_y=valid_y,
                                 step=1)
    
    model.fit(train_x,
              train_y,
              valid_x,
              valid_y,
              batch_size=100,
              epochs = 150, 
              callbacks=[eval_callback, tf_board_callback])
    

    使用了两个回调函数,eval_callback和tf_board_callback。

    1、eval_callback是是 Kashgari 内置回调函数,会在训练过程计算精确度,召回率和 F1。
    其中step默认值是5,也就是默认每5个epoch训计算一次精确度,召回率和 F1。
    2、tf_board_callback回调函数在当前目录下生成TensorBoard日志文件。
    在后台运行tensorboard时,指定该目录,即可打开tensorboard页面查看。

    # tensorboard --logdir=logs
    

    在浏览器中输入http://localhost:6006/

    image.png

    2、预训练模型的使用

    下载预训练模型,先从最有名的bert中文模型开始吧
    下载地址:
    https://github.com/google-research/bert
    找到Bert-Base, Chinese模型:

    Bert-Base, Chinese模型
    #!/usr/bin/env python
    # -*- encoding: utf-8 -*-
    """
    @Author  :   Yang Song
    @Time    :   2020/5/20 15:52 
    """
    from kashgari.corpus import SMP2018ECDTCorpus
    import keras
    import kashgari
    from kashgari.tasks.classification import BiLSTM_Model
    from kashgari.callbacks import EvalCallBack
    
    import logging
    logging.basicConfig(level='DEBUG')
    
    # 加载内置数据集
    train_x, train_y = SMP2018ECDTCorpus.load_data('train')
    valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
    test_x, test_y = SMP2018ECDTCorpus.load_data('test')
    
    # # 也可以使用自己的数据集,格式如下:
    # train_x = [['Hello', 'world'], ['Hello', 'Kashgari']]
    # train_y = ['a', 'b']
    
    valid_x, valid_y = train_x, train_y
    test_x, test_y = train_x, train_y
    
    from kashgari.embeddings import BERTEmbedding
    
    # 
    bert_embed = BERTEmbedding('chinese_L-12_H-768_A-12',
                               task=kashgari.CLASSIFICATION,
                               sequence_length=100)
    model = BiLSTM_Model(bert_embed)
    model.fit(train_x, train_y, valid_x, valid_y)
    
    

    chinese_L-12_H-768_A-12就是下载的中文bert模型目录,当前代码是处在0_YS_TEST目录下:

    chinese_L-12_H-768_A-12
    模型效果:
    Epoch 1/5
    30/30 [==============================] - 32s 1s/step - loss: 1.5468 - acc: 0.6119 - val_loss: 0.6305 - val_acc: 0.8628
    Epoch 2/5
    30/30 [==============================] - 26s 856ms/step - loss: 0.5488 - acc: 0.8767 - val_loss: 0.3349 - val_acc: 0.9335
    Epoch 3/5
    30/30 [==============================] - 26s 850ms/step - loss: 0.2987 - acc: 0.9389 - val_loss: 0.1907 - val_acc: 0.9670
    Epoch 4/5
    30/30 [==============================] - 26s 862ms/step - loss: 0.2027 - acc: 0.9607 - val_loss: 0.1159 - val_acc: 0.9841
    Epoch 5/5
    30/30 [==============================] - 26s 865ms/step - loss: 0.1377 - acc: 0.9761 - val_loss: 0.0758 - val_acc: 0.9947
    

    相关文章

      网友评论

          本文标题:kashgari学习笔记-1

          本文链接:https://www.haomeiwen.com/subject/bscnohtx.html