TensorFlow + NLP 入门

作者: Joshua_精东 | 来源:发表于2019-03-12 00:16 被阅读0次

    业务需要,研究下TF和NLP(Natural language processing)。

    Step 1 安装TensorFlow本地开发环境

    先安装下TensorFlow,没TensorFlow没法开心的玩耍。

    接下来直奔主题!

    Google TF官网 -> Install(各式各样的安装方法,找最简单的🔍)-> Docker安装(Docker这东西真是人见人爱、花见花开)

    docker pull tensorflow/tensorflow

    镜像拉取好了,执行下:

    docker run -it tensorflow/tensorflow bash

    root@e7b70c1079df:/# python --version
    Python 2.7.12
    

    安装成功

    官方执行本地代码的指令,备份下,后面好复制,常用!

    docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py

    其他的有空在看,至此Python 2.7的TF在我的mac上已经顺利运行起来了。

    Step 2 下载大神的NLP源码

    https://github.com/dennybritz/cnn-text-classification-tf

    This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.

    It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

    Requirements

    • Python 3
    • Tensorflow > 0.12
    • Numpy

    Shit。。。没注意,是Python 3环境,重新找下镜像 - -!
    docker pull tensorflow/tensorflow:latest-py3-jupyter
    有2个Tag,一个是latest-py3,另一个是latest-py3-jupyter,我选了带jupyter的,以备不时之需✌️
    Pull好镜像后重复下上面的命令:
    docker run -it tensorflow/tensorflow:latest-py3-jupyter bash

    root@cb55072b95e5:/tf# python --version
    Python 3.5.2
    

    完美✌️😜

    Step 3 是骡子是马,拉起来溜溜

    下载好的源代码,用编辑器打开,我用的PyCharm,然后到程序根目录,执行命令:
    docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow:latest-py3-jupyter bash
    进到容器里看下

    root@88ef18eae92c:/tmp# ll
    total 48
    drwxr-xr-x 11 root root   352 Mar 11 16:34 ./
    drwxr-xr-x  1 root root  4096 Mar 11 16:34 ../
    -rwxr-xr-x  1 root root   870 Jul 20  2018 .gitignore*
    drwxr-xr-x  6 root root   192 Mar 11 16:32 .idea/
    -rwxr-xr-x  1 root root 11357 Jul 20  2018 LICENSE*
    -rwxr-xr-x  1 root root  2280 Jul 20  2018 README.md*
    drwxr-xr-x  3 root root    96 Jul 20  2018 data/
    -rwxr-xr-x  1 root root  2472 Jul 20  2018 data_helpers.py*
    -rwxr-xr-x  1 root root  3738 Jul 20  2018 eval.py*
    -rwxr-xr-x  1 root root  3776 Jul 20  2018 text_cnn.py*
    -rwxr-xr-x  1 root root  9073 Jul 20  2018 train.py*
    

    至此,本地代码、Docker镜像、本地编辑器,完美连接在一起,可以开心的玩耍了✌️

    按照大神的README.md指引,运行一下train脚本,看看help:

    root@88ef18eae92c:/tmp# ./train.py --help
    
           USAGE: ./train.py [flags]
    flags:
    
    ./train.py:
      --[no]allow_soft_placement: Allow device soft device placement
        (default: 'true')
      --batch_size: Batch Size (default: 64)
        (default: '64')
        (an integer)
      --checkpoint_every: Save model after this many steps (default: 100)
        (default: '100')
        (an integer)
      --dev_sample_percentage: Percentage of the training data to use for validation
        (default: '0.1')
        (a number)
      --dropout_keep_prob: Dropout keep probability (default: 0.5)
        (default: '0.5')
        (a number)
      --embedding_dim: Dimensionality of character embedding (default: 128)
        (default: '128')
        (an integer)
      --evaluate_every: Evaluate model on dev set after this many steps (default: 100)
        (default: '100')
        (an integer)
      --filter_sizes: Comma-separated filter sizes (default: '3,4,5')
        (default: '3,4,5')
      --l2_reg_lambda: L2 regularization lambda (default: 0.0)
        (default: '0.0')
        (a number)
      --[no]log_device_placement: Log placement of ops on devices
        (default: 'false')
      --negative_data_file: Data source for the negative data.
        (default: './data/rt-polaritydata/rt-polarity.neg')
      --num_checkpoints: Number of checkpoints to store (default: 5)
        (default: '5')
        (an integer)
      --num_epochs: Number of training epochs (default: 200)
        (default: '200')
        (an integer)
      --num_filters: Number of filters per filter size (default: 128)
        (default: '128')
        (an integer)
      --positive_data_file: Data source for the positive data.
        (default: './data/rt-polaritydata/rt-polarity.pos')
    
    Try --helpfull to get a list of all flags.
    

    小小的鸡冻,漂亮滴打出了help内容 😄
    略微看下,然。。。看不懂,先train一下试试吧
    ./train.py
    顺利运行,一个字“稳”

    Evaluation:
    2019-03-11T16:54:25.745084: step 10300, loss 3.65937, acc 0.711069
    
    Saved model checkpoint to /tmp/runs/1552322335/checkpoints/model-10300
    
    2019-03-11T16:54:26.545635: step 10301, loss 0.000296011, acc 1
    2019-03-11T16:54:26.776015: step 10302, loss 5.95611e-05, acc 1
    2019-03-11T16:54:26.850238: step 10303, loss 0.00182802, acc 1
    2019-03-11T16:54:26.932814: step 10304, loss 1.41934e-05, acc 1
    2019-03-11T16:54:27.001730: step 10305, loss 0.00106164, acc 1
    2019-03-11T16:54:27.072143: step 10306, loss 0.00159799, acc 1
    2019-03-11T16:54:27.141833: step 10307, loss 0.000124719, acc 1
    2019-03-11T16:54:27.216436: step 10308, loss 3.5929e-06, acc 1
    2019-03-11T16:54:27.289951: step 10309, loss 1.2785e-05, acc 1
    2019-03-11T16:54:27.362660: step 10310, loss 0.00844685, acc 1
    2019-03-11T16:54:27.430708: step 10311, loss 0.000167686, acc 1
    2019-03-11T16:54:27.502281: step 10312, loss 0.000110473, acc 1
    2019-03-11T16:54:27.572092: step 10313, loss 0.000771175, acc 1
    2019-03-11T16:54:27.644654: step 10314, loss 3.88898e-06, acc 1
    2019-03-11T16:54:27.714136: step 10315, loss 0.000124581, acc 1
    2019-03-11T16:54:27.781347: step 10316, loss 5.31748e-06, acc 1
    2019-03-11T16:54:27.855259: step 10317, loss 0.000178186, acc 1
    2019-03-11T16:54:27.925776: step 10318, loss 1.3183e-05, acc 1
    2019-03-11T16:54:28.001957: step 10319, loss 0.000173645, acc 1
    

    Step 4 阅读大大神之作

    《Convolutional Neural Networks for Sentence Classification》
    Author:
    Yoon Kim
    New York University

    https://arxiv.org/pdf/1408.5882.pdf

    《A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional
    Neural Networks for Sentence Classification》
    Author:
    Ye Zhang
    Dept. of Computer Science
    University of Texas at Austin

    Byron C. Wallace
    iSchool
    University of Texas at Austin

    https://arxiv.org/pdf/1510.03820.pdf

    我们回头见!

    相关文章

      网友评论

        本文标题:TensorFlow + NLP 入门

        本文链接:https://www.haomeiwen.com/subject/wojwpqtx.html