TensorFlow1.0 - C2 Guide - 1 Hig

作者: 左心Chris | 来源:发表于2019-08-12 18:08 被阅读0次

TensorFlow1.0 - C2 Guide - 1 Hig
TensorFlow1.0 - C2 Guide - 2 Est
TensorFlow1.0 - C2 Guide - 4 Low
TensorFlow1.0 - C2 Guide - 5 Emb
TensorFlow1.0 - C2 Guide - 8 Ext
tensorflow1.0 vs tensorflow2.0 v
GeekBand Week 2
openfire因为网络不稳定而造成消息丢失的解决方案
2018-12-19
应届生面试问答（中英）

https://www.tensorflow.org/guide/keras

1 Keras

建立模型步骤compile fit evaluate predict和输入dataset
高级方法functional API, subclassing model, custom layer, callbacks
save and restore, eager execution, distribution

Import tf.keras

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow import keras

not uptodate version compared to keras
defaults to the checkpoint format not HDF5(pass save_format='h5' to use)

Build a simple model

序列模型

model = tf.keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(layers.Dense(64, activation='relu'))
# Add another:
model.add(layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(layers.Dense(10, activation='softmax'))

这些层可以配置
activation激活函数, kernel_initializer和bias_initializer初始化方案, kernel_regularizer和bias_regularizer 正则化方案
训练和评估

numpy输入

model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

optimizer: 训练方法，loss：优化函数，metrics：训练监控

import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 10))

model.fit(data, labels, epochs=10, batch_size=32,
          validation_data=(val_data, val_labels))

epochs：训练周期（整个输入数据的一次迭代），batch_size：批次大小 (每次iterate用多少数据迭代)
validation_data：在每个训练周期结束加上显示验证集的数据和指标

dataset输入

# Instantiates a toy dataset instance:
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32)
dataset = dataset.repeat()

# Don't forget to specify `steps_per_epoch` when calling `fit` on a dataset.
model.fit(dataset, epochs=10, steps_per_epoch=30)

使用Dataset.from_tensor_slices()和dataset.batch().repeat()方法
steps_per_epoch（表示模型在进入下一个周期之前运行的训练步数）

评估和预测

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

model.evaluate(data, labels, batch_size=32)

model.evaluate(dataset, steps=30)
result = model.predict(data, batch_size=32)
print(result.shape)

构建高级模型

函数式API

inputs = tf.keras.Input(shape=(32,))  # Returns a placeholder tensor

# A layer instance is callable on a tensor, and returns a tensor.
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
predictions = layers.Dense(10, activation='softmax')(x)

# kernel
model = tf.keras.Model(inputs=inputs, outputs=predictions)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs
model.fit(data, labels, batch_size=32, epochs=5)

Model subclassing
可以定制自己的model
Custom layers
可以定制自己的层

Callbacks

四种ModelCheckpoint, LearningRateScheduler, EarlyStopping, TensorBoard
https://www.tensorflow.org/guide/summaries_and_tensorboard

Save and restore

weight
config
entire model

# Create a trivial model
model = tf.keras.Sequential([
  layers.Dense(64, activation='relu', input_shape=(32,)),
  layers.Dense(10, activation='softmax')
])
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels, batch_size=32, epochs=5)


# Save entire model to a HDF5 file
model.save('my_model.h5')

# Recreate the exact same model, including weights and optimizer.
model = tf.keras.models.load_model('my_model.h5')

Eager execution

有利于测试model subclassing和custom layers

Distribution

Estimators
通过tf.keras.estimator.model_to_estimator
转化keras的model为estimator
Multiple GPUs
使用tf.contrib.distribute.MirroredStrategy
三步走：

把 keras model 转化为model
用dataset建立input pipline
用tf.estimator.RunConfig设置分布式策略

2 Eager Execution

Basic

setup
tf.enable_eager_execution()
tf.executing_eagerly()
TensorFlow math operations convert Python objects and NumPy arrays to tf.Tensor objects

# Obtain numpy value from a tensor:
print(a.numpy())
# => [[1 2]
#     [3 4]]

Dynamic control flow

def fizzbuzz(max_num):
  counter = tf.constant(0)
  max_num = tf.convert_to_tensor(max_num)
  for num in range(1, max_num.numpy()+1):
    num = tf.constant(num)
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num.numpy())
    counter += 1

Eager training

Computing gradients
Train a model
Variables and optimizers

Use objects for state

Advanced automatic differentiation topics

Performance

Work with graphs

3 Importing Data

tf.data.Dataset和tf.data.Iterator
source, transform, iterator, consume, save
read numpy, tfrecord, text, csv
preprocess parse tf.Example(tfrecord), decode image and risize, python logic
batching and padding and multiple epochs and randomly shuffling
intro to Estimators training, evaluation, prediction, export for serving

3.1 Intro

Using tf.data
Two parts: tf.data.Dataset and tf.data.Iterator

3.2 Basic mechanics

Define a source

tf.data.Dataset.from_tensors() ortf.data.Dataset.from_tensor_slices(). Alternatively, if your input data are on disk in the recommended TFRecord format, you can construct atf.data.TFRecordDataset

Transform

Use Dataset.map() and Dataset.batch()等变化 Dataset

Iterator

Dataset.make_one_shot_iterator()
初始化Iterator.initializer
下一个Iterator.get_next()

Dataset structrue

A dataset contains elements
Each elements contains one or more tensor objects called components
Each component has tf.DType and tf.TensorShape
Dataset.output_types and Dataset.output_shapes to inspect the types and shapes of each component of a dataset element
https://www.jianshu.com/p/1da2648c0962
Using dictionary mapping

dataset = tf.data.Dataset.from_tensor_slices(
   {"a": tf.random_uniform([4]),
    "b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types)  # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes)  # ==> "{'a': (), 'b': (100,)}"

Dataset Transformation:

Dataset.map(), Dataset.flat_map(), Dataset.filter()
https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map
http://www.feiguyunai.com/index.php/2017/12/25/pyhtonai-ml-dataprocess-datasetapi/
https://www.leiphone.com/news/201711/zV7yM5W1dFrzs8W5.html

Creating an iterator

one-shot：

dataset = tf.data.Dataset.range(100)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

for i in range(100):
  value = sess.run(next_element)
  assert i == value

initializable：

max_value = tf.placeholder(tf.int64, shape=[])
dataset = tf.data.Dataset.range(max_value)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()

# Initialize an iterator over a dataset with 10 elements.
sess.run(iterator.initializer, feed_dict={max_value: 10})
for i in range(10):
  value = sess.run(next_element)
  assert i == value

# Initialize the same iterator over a dataset with 100 elements.
sess.run(iterator.initializer, feed_dict={max_value: 100})
for i in range(100):
  value = sess.run(next_element)
  assert i == value

reinitializable：

# Define training and validation datasets with the same structure.
training_dataset = tf.data.Dataset.range(100).map(
    lambda x: x + tf.random_uniform([], -10, 10, tf.int64))
validation_dataset = tf.data.Dataset.range(50)

# A reinitializable iterator is defined by its structure. We could use the
# `output_types` and `output_shapes` properties of either `training_dataset`
# or `validation_dataset` here, because they are compatible.
iterator = tf.data.Iterator.from_structure(training_dataset.output_types,
                                          training_dataset.output_shapes)
next_element = iterator.get_next()

feedable：

# Define training and validation datasets with the same structure.
training_dataset = tf.data.Dataset.range(100).map(
    lambda x: x + tf.random_uniform([], -10, 10, tf.int64)).repeat()
validation_dataset = tf.data.Dataset.range(50)

# A feedable iterator is defined by a handle placeholder and its structure. We
# could use the `output_types` and `output_shapes` properties of either
# `training_dataset` or `validation_dataset` here, because they have
# identical structure.
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(
    handle, training_dataset.output_types, training_dataset.output_shapes)
next_element = iterator.get_next()

# You can use feedable iterators with a variety of different kinds of iterator
# (such as one-shot and initializable iterators).
training_iterator = training_dataset.make_one_shot_iterator()
validation_iterator = validation_dataset.make_initializable_iterator()

# The `Iterator.string_handle()` method returns a tensor that can be evaluated
# and used to feed the `handle` placeholder.
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(validation_iterator.string_handle())

# Loop forever, alternating between training and validation.
while True:
  # Run 200 steps using the training dataset. Note that the training dataset is
  # infinite, and we resume from where we left off in the previous `while` loop
  # iteration.
  for _ in range(200):
    sess.run(next_element, feed_dict={handle: training_handle})

  # Run one pass over the validation dataset.
  sess.run(validation_iterator.initializer)
  for _ in range(50):
    sess.run(next_element, feed_dict={handle: validation_handle})

https://blog.csdn.net/briblue/article/details/80962728
one-shot每次只吐一个
可初始化的 Iterator怎么吐可以定制
可重新初始化的 Iterator这个iterator可以接不同的dataset
水管的转换器，可馈送的 Iterator: 用handle保证每次切换dataset不用重新初始化即不用重新开始

Consuming values from an iterator

sess.run(iterator.initializer)
while True:
  try:
    sess.run(result)
  except tf.errors.OutOfRangeError:
    break

Saving iterator state

tf.contrib.data.make_saveable_from_iterator

3.3 Reading input data

Consuming NumPy arrays

# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
  features = data["features"]
  labels = data["labels"]

# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]

dataset = tf.data.Dataset.from_tensor_slices((features, labels))

直接写，但是会有内存问题

# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
  features = data["features"]
  labels = data["labels"]

# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]

features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)

dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# [Other transformations on `dataset`...]
dataset = ...
iterator = dataset.make_initializable_iterator()

sess.run(iterator.initializer, feed_dict={features_placeholder: features,
                                          labels_placeholder: labels})

用placeholder更好的

Consuming TFRecord

Consuming Text data

每一行为一条数据

filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.TextLineDataset(filenames)

然后用flat_map过滤不需要的数据

Consuming CSV data

# Creates a dataset that reads all of the records from two CSV files, each with
# eight float columns
filenames = ["/var/data/file1.csv", "/var/data/file2.csv"]
record_defaults = [tf.float32] * 8   # Eight required float columns
dataset = tf.data.experimental.CsvDataset(filenames, record_defaults)

能对每一个列提供一个默认值；
可以过滤掉header，可以明确哪几列有默认值

3.4 Preprocessing data

parse tf.Example

使用tf方法写一个函数，然后用dataset.map(_parse_function)

# Transforms a scalar string `example_proto` into a pair of a scalar string and
# a scalar integer, representing an image and its label, respectively.
def _parse_function(example_proto):
  features = {"image": tf.FixedLenFeature((), tf.string, default_value=""),
              "label": tf.FixedLenFeature((), tf.int64, default_value=0)}
  parsed_features = tf.parse_single_example(example_proto, features)
  return parsed_features["image"], parsed_features["label"]

# Creates a dataset that reads all of the examples from two files, and extracts
# the image and label features.
filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function)

就是读这种格式的时候可能会有多个feature

decoding image data and resizing it

同上

Applying arbitrary Python logic

使用tf.py_func()来实现用外部库预处理

3.5 Batching dataset elements

简单batch

inc_dataset = tf.data.Dataset.range(100)
dec_dataset = tf.data.Dataset.range(0, -100, -1)
dataset = tf.data.Dataset.zip((inc_dataset, dec_dataset))
batched_dataset = dataset.batch(4)

iterator = batched_dataset.make_one_shot_iterator()
next_element = iterator.get_next()

print(sess.run(next_element))  # ==> ([0, 1, 2,   3],   [ 0, -1,  -2,  -3])
print(sess.run(next_element))  # ==> ([4, 5, 6,   7],   [-4, -5,  -6,  -7])
print(sess.run(next_element))  # ==> ([8, 9, 10, 11],   [-8, -9, -10, -11])

就是把连续batch_size个elements组合成一个elements

Batching with padding

dataset = tf.data.Dataset.range(100)
dataset = dataset.map(lambda x: tf.fill([tf.cast(x, tf.int32)], x))
dataset = dataset.padded_batch(4, padded_shapes=(None,))

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

print(sess.run(next_element))  # ==> [[0, 0, 0], [1, 0, 0], [2, 2, 0], [3, 3, 3]]
print(sess.run(next_element))  # ==> [[4, 4, 4, 4, 0, 0, 0],
                               #      [5, 5, 5, 5, 5, 0, 0],
                               #      [6, 6, 6, 6, 6, 6, 0],
                               #      [7, 7, 7, 7, 7, 7, 7]]

如果每条记录不等长，可以pad到等长，如果没定义补成什么形状，就补到最大为止

3.6 Training workflows

Processing multiple epochs

用repeat方法

filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...)
dataset = dataset.repeat(10)
dataset = dataset.batch(32)

每次epoch初始化iterator，用tf.errors.OutOfRangeError来判断每次epoch结束

Randomly shuffling input data

https://juejin.im/post/5b855d016fb9a01a1a27d035
每次取buffer_size的候选集，然后每次取batch_size的视频从buffer_size里面去取
https://stackoverflow.com/questions/46444018/meaning-of-buffer-size-in-dataset-map-dataset-prefetch-and-dataset-shuffle

Using high-level APIs

-The tf.train.MonitoredTrainingSession simplifies many aspects of running TensorFlow in a distributed setting

Estimator可以直接处理 Dataset，框架会自动创建iterator并初始化

4 Introduction to Estimators

四步：training, evaluation, prediction, export for serving

4.1 Advantages of Estimators

分布式，简化实现

4.2 Pre-made Estimators

Dataset importing function

Each dataset importing function must return two objects:

a dictionary in which the keys are feature names and the values are Tensors (or SparseTensors) containing the corresponding feature data
a Tensor containing one or more labels

Define the feature columns

# Define three numeric feature columns.
population = tf.feature_column.numeric_column('population')
crime_rate = tf.feature_column.numeric_column('crime_rate')
median_education = tf.feature_column.numeric_column('median_education',
                    normalizer_fn=lambda x: x - global_education_mean)

Instantiate the Estimator

# Instantiate an estimator, passing the feature columns.
estimator = tf.estimator.LinearClassifier(
    feature_columns=[population, crime_rate, median_education])

Call a training, evaluation, or inference method

# `input_fn` is the function created in Step 1
estimator.train(input_fn=my_training_set, steps=2000)

4.3 Custom Estimators

A companion document explains how to write the model function.

Recommend workflow

先用pre-made estimator建立一个博学
然后建立overall pipeline，保证integrity和reliability
尝试其他的pre-made estimator
尝试自己建立的estimator

4.4 Creating Estimators from Keras models

Call tf.keras.estimator.model_to_estimator

TensorFlow1.0 - C2 Guide - 1 Hig
https://www.tensorflow.org/guide/keras 1 Keras 建立模型步骤comp...
TensorFlow1.0 - C2 Guide - 2 Est
1 Premade Estimators input function return dataset, featu...
TensorFlow1.0 - C2 Guide - 4 Low
1 Intro 组成 tf.Graph 和 tf.Session简单的输入constant，placeholder...
TensorFlow1.0 - C2 Guide - 5 Emb
1 Embeddings https://www.tensorflow.org/guide/embeddingGo...
TensorFlow1.0 - C2 Guide - 8 Ext
1 TensorFlow architecture ClientDistributed MasterWorker ...
tensorflow1.0 vs tensorflow2.0 v
主要介绍 1：tensorflow版本变动 2：tensorflow1.0 vs tensorflow2.0 vs...
GeekBand Week 2
类成员函数的Big 3 拷贝构造 C c1=c2;或C c1(c2); 赋值构造 c1=c2; 析构 ~C();这...
openfire因为网络不稳定而造成消息丢失的解决方案
1 2 C1 ------- S ------- C2 消息丢失状态：C1在线，但C2因为网络问题...
2018-12-19
1. It is strictly prohibited to wear clogs, slippers, hig...
应届生面试问答（中英）
1.Describe an instance where you set your sights on a hig...