1 Keras
建立模型步骤compile fit evaluate predict和输入dataset
高级方法functional API, subclassing model, custom layer, callbacks
save and restore, eager execution, distribution
Import tf.keras
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow import keras
not uptodate version compared to keras
defaults to the checkpoint format not HDF5(pass save_format='h5' to use)
Build a simple model
- 序列模型
model = tf.keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(layers.Dense(64, activation='relu'))
# Add another:
model.add(layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(layers.Dense(10, activation='softmax'))
- 这些层可以配置
activation激活函数, kernel_initializer和bias_initializer初始化方案, kernel_regularizer和bias_regularizer 正则化方案 - 训练和评估
- numpy输入
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
optimizer: 训练方法,loss:优化函数,metrics:训练监控
import numpy as np
data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))
val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 10))
model.fit(data, labels, epochs=10, batch_size=32,
validation_data=(val_data, val_labels))
epochs:训练周期(整个输入数据的一次迭代),batch_size:批次大小 (每次iterate用多少数据迭代)
validation_data:在每个训练周期结束加上显示验证集的数据和指标
- dataset输入
# Instantiates a toy dataset instance:
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32)
dataset = dataset.repeat()
# Don't forget to specify `steps_per_epoch` when calling `fit` on a dataset.
model.fit(dataset, epochs=10, steps_per_epoch=30)
使用Dataset.from_tensor_slices()和dataset.batch().repeat()方法
steps_per_epoch(表示模型在进入下一个周期之前运行的训练步数)
- 评估和预测
data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))
model.evaluate(data, labels, batch_size=32)
model.evaluate(dataset, steps=30)
result = model.predict(data, batch_size=32)
print(result.shape)
构建高级模型
- 函数式API
inputs = tf.keras.Input(shape=(32,)) # Returns a placeholder tensor
# A layer instance is callable on a tensor, and returns a tensor.
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
predictions = layers.Dense(10, activation='softmax')(x)
# kernel
model = tf.keras.Model(inputs=inputs, outputs=predictions)
# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
# Trains for 5 epochs
model.fit(data, labels, batch_size=32, epochs=5)
- Model subclassing
可以定制自己的model - Custom layers
可以定制自己的层
Callbacks
四种ModelCheckpoint, LearningRateScheduler, EarlyStopping, TensorBoard
https://www.tensorflow.org/guide/summaries_and_tensorboard
Save and restore
weight
config
entire model
# Create a trivial model
model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(32,)),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, batch_size=32, epochs=5)
# Save entire model to a HDF5 file
model.save('my_model.h5')
# Recreate the exact same model, including weights and optimizer.
model = tf.keras.models.load_model('my_model.h5')
Eager execution
有利于测试model subclassing和custom layers
Distribution
- Estimators
通过tf.keras.estimator.model_to_estimator
转化keras的model为estimator - Multiple GPUs
使用tf.contrib.distribute.MirroredStrategy
三步走:
- 把 keras model 转化为model
- 用dataset建立input pipline
- 用tf.estimator.RunConfig设置分布式策略
2 Eager Execution
Basic
- setup
tf.enable_eager_execution()
tf.executing_eagerly()
TensorFlow math operations convert Python objects and NumPy arrays totf.Tensor
objects
# Obtain numpy value from a tensor:
print(a.numpy())
# => [[1 2]
# [3 4]]
- Dynamic control flow
def fizzbuzz(max_num):
counter = tf.constant(0)
max_num = tf.convert_to_tensor(max_num)
for num in range(1, max_num.numpy()+1):
num = tf.constant(num)
if int(num % 3) == 0 and int(num % 5) == 0:
print('FizzBuzz')
elif int(num % 3) == 0:
print('Fizz')
elif int(num % 5) == 0:
print('Buzz')
else:
print(num.numpy())
counter += 1
Eager training
- Computing gradients
- Train a model
- Variables and optimizers
Use objects for state
Advanced automatic differentiation topics
Performance
Work with graphs
3 Importing Data
tf.data.Dataset和tf.data.Iterator
source, transform, iterator, consume, save
read numpy, tfrecord, text, csv
preprocess parse tf.Example(tfrecord), decode image and risize, python logic
batching and padding and multiple epochs and randomly shuffling
intro to Estimators training, evaluation, prediction, export for serving
3.1 Intro
Using tf.data
Two parts: tf.data.Dataset and tf.data.Iterator
3.2 Basic mechanics
Define a source
tf.data.Dataset.from_tensors()
ortf.data.Dataset.from_tensor_slices()
. Alternatively, if your input data are on disk in the recommended TFRecord format, you can construct atf.data.TFRecordDataset
Transform
Use Dataset.map() and Dataset.batch()等变化 Dataset
Iterator
Dataset.make_one_shot_iterator()
初始化Iterator.initializer
下一个Iterator.get_next()
Dataset structrue
A dataset contains elements
Each elements contains one or more tensor objects called components
Each component has tf.DType and tf.TensorShape
Dataset.output_types and Dataset.output_shapes to inspect the types and shapes of each component of a dataset element
https://www.jianshu.com/p/1da2648c0962
Using dictionary mapping
dataset = tf.data.Dataset.from_tensor_slices(
{"a": tf.random_uniform([4]),
"b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types) # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes) # ==> "{'a': (), 'b': (100,)}"
Dataset Transformation:
Dataset.map(), Dataset.flat_map(), Dataset.filter()
https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map
http://www.feiguyunai.com/index.php/2017/12/25/pyhtonai-ml-dataprocess-datasetapi/
https://www.leiphone.com/news/201711/zV7yM5W1dFrzs8W5.html
Creating an iterator
one-shot:
dataset = tf.data.Dataset.range(100)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
for i in range(100):
value = sess.run(next_element)
assert i == value
initializable:
max_value = tf.placeholder(tf.int64, shape=[])
dataset = tf.data.Dataset.range(max_value)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
# Initialize an iterator over a dataset with 10 elements.
sess.run(iterator.initializer, feed_dict={max_value: 10})
for i in range(10):
value = sess.run(next_element)
assert i == value
# Initialize the same iterator over a dataset with 100 elements.
sess.run(iterator.initializer, feed_dict={max_value: 100})
for i in range(100):
value = sess.run(next_element)
assert i == value
reinitializable:
# Define training and validation datasets with the same structure.
training_dataset = tf.data.Dataset.range(100).map(
lambda x: x + tf.random_uniform([], -10, 10, tf.int64))
validation_dataset = tf.data.Dataset.range(50)
# A reinitializable iterator is defined by its structure. We could use the
# `output_types` and `output_shapes` properties of either `training_dataset`
# or `validation_dataset` here, because they are compatible.
iterator = tf.data.Iterator.from_structure(training_dataset.output_types,
training_dataset.output_shapes)
next_element = iterator.get_next()
feedable:
# Define training and validation datasets with the same structure.
training_dataset = tf.data.Dataset.range(100).map(
lambda x: x + tf.random_uniform([], -10, 10, tf.int64)).repeat()
validation_dataset = tf.data.Dataset.range(50)
# A feedable iterator is defined by a handle placeholder and its structure. We
# could use the `output_types` and `output_shapes` properties of either
# `training_dataset` or `validation_dataset` here, because they have
# identical structure.
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(
handle, training_dataset.output_types, training_dataset.output_shapes)
next_element = iterator.get_next()
# You can use feedable iterators with a variety of different kinds of iterator
# (such as one-shot and initializable iterators).
training_iterator = training_dataset.make_one_shot_iterator()
validation_iterator = validation_dataset.make_initializable_iterator()
# The `Iterator.string_handle()` method returns a tensor that can be evaluated
# and used to feed the `handle` placeholder.
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(validation_iterator.string_handle())
# Loop forever, alternating between training and validation.
while True:
# Run 200 steps using the training dataset. Note that the training dataset is
# infinite, and we resume from where we left off in the previous `while` loop
# iteration.
for _ in range(200):
sess.run(next_element, feed_dict={handle: training_handle})
# Run one pass over the validation dataset.
sess.run(validation_iterator.initializer)
for _ in range(50):
sess.run(next_element, feed_dict={handle: validation_handle})
https://blog.csdn.net/briblue/article/details/80962728
one-shot每次只吐一个
可初始化的 Iterator怎么吐可以定制
可重新初始化的 Iterator这个iterator可以接不同的dataset
水管的转换器,可馈送的 Iterator: 用handle保证每次切换dataset不用重新初始化即不用重新开始
Consuming values from an iterator
sess.run(iterator.initializer)
while True:
try:
sess.run(result)
except tf.errors.OutOfRangeError:
break
Saving iterator state
tf.contrib.data.make_saveable_from_iterator
3.3 Reading input data
Consuming NumPy arrays
# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
直接写,但是会有内存问题
# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)
dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# [Other transformations on `dataset`...]
dataset = ...
iterator = dataset.make_initializable_iterator()
sess.run(iterator.initializer, feed_dict={features_placeholder: features,
labels_placeholder: labels})
用placeholder更好的
Consuming TFRecord
Consuming Text data
每一行为一条数据
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.TextLineDataset(filenames)
然后用flat_map过滤不需要的数据
Consuming CSV data
# Creates a dataset that reads all of the records from two CSV files, each with
# eight float columns
filenames = ["/var/data/file1.csv", "/var/data/file2.csv"]
record_defaults = [tf.float32] * 8 # Eight required float columns
dataset = tf.data.experimental.CsvDataset(filenames, record_defaults)
能对每一个列提供一个默认值;
可以过滤掉header,可以明确哪几列有默认值
3.4 Preprocessing data
parse tf.Example
使用tf方法写一个函数,然后用dataset.map(_parse_function)
# Transforms a scalar string `example_proto` into a pair of a scalar string and
# a scalar integer, representing an image and its label, respectively.
def _parse_function(example_proto):
features = {"image": tf.FixedLenFeature((), tf.string, default_value=""),
"label": tf.FixedLenFeature((), tf.int64, default_value=0)}
parsed_features = tf.parse_single_example(example_proto, features)
return parsed_features["image"], parsed_features["label"]
# Creates a dataset that reads all of the examples from two files, and extracts
# the image and label features.
filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function)
就是读这种格式的时候可能会有多个feature
decoding image data and resizing it
同上
Applying arbitrary Python logic
使用tf.py_func()来实现用外部库预处理
3.5 Batching dataset elements
简单batch
inc_dataset = tf.data.Dataset.range(100)
dec_dataset = tf.data.Dataset.range(0, -100, -1)
dataset = tf.data.Dataset.zip((inc_dataset, dec_dataset))
batched_dataset = dataset.batch(4)
iterator = batched_dataset.make_one_shot_iterator()
next_element = iterator.get_next()
print(sess.run(next_element)) # ==> ([0, 1, 2, 3], [ 0, -1, -2, -3])
print(sess.run(next_element)) # ==> ([4, 5, 6, 7], [-4, -5, -6, -7])
print(sess.run(next_element)) # ==> ([8, 9, 10, 11], [-8, -9, -10, -11])
就是把连续batch_size个elements组合成一个elements
Batching with padding
dataset = tf.data.Dataset.range(100)
dataset = dataset.map(lambda x: tf.fill([tf.cast(x, tf.int32)], x))
dataset = dataset.padded_batch(4, padded_shapes=(None,))
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
print(sess.run(next_element)) # ==> [[0, 0, 0], [1, 0, 0], [2, 2, 0], [3, 3, 3]]
print(sess.run(next_element)) # ==> [[4, 4, 4, 4, 0, 0, 0],
# [5, 5, 5, 5, 5, 0, 0],
# [6, 6, 6, 6, 6, 6, 0],
# [7, 7, 7, 7, 7, 7, 7]]
如果每条记录不等长,可以pad到等长,如果没定义补成什么形状,就补到最大为止
3.6 Training workflows
Processing multiple epochs
- 用repeat方法
filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...)
dataset = dataset.repeat(10)
dataset = dataset.batch(32)
- 每次epoch初始化iterator,用tf.errors.OutOfRangeError来判断每次epoch结束
Randomly shuffling input data
https://juejin.im/post/5b855d016fb9a01a1a27d035
每次取buffer_size的候选集,然后每次取batch_size的视频从buffer_size里面去取
https://stackoverflow.com/questions/46444018/meaning-of-buffer-size-in-dataset-map-dataset-prefetch-and-dataset-shuffle
Using high-level APIs
-The tf.train.MonitoredTrainingSession
simplifies many aspects of running TensorFlow in a distributed setting
- Estimator可以直接处理 Dataset,框架会自动创建iterator并初始化
4 Introduction to Estimators
四步:training, evaluation, prediction, export for serving
4.1 Advantages of Estimators
分布式,简化实现
4.2 Pre-made Estimators
Dataset importing function
Each dataset importing function must return two objects:
- a dictionary in which the keys are feature names and the values are Tensors (or SparseTensors) containing the corresponding feature data
- a Tensor containing one or more labels
Define the feature columns
# Define three numeric feature columns.
population = tf.feature_column.numeric_column('population')
crime_rate = tf.feature_column.numeric_column('crime_rate')
median_education = tf.feature_column.numeric_column('median_education',
normalizer_fn=lambda x: x - global_education_mean)
Instantiate the Estimator
# Instantiate an estimator, passing the feature columns.
estimator = tf.estimator.LinearClassifier(
feature_columns=[population, crime_rate, median_education])
Call a training, evaluation, or inference method
# `input_fn` is the function created in Step 1
estimator.train(input_fn=my_training_set, steps=2000)
4.3 Custom Estimators
A companion document explains how to write the model function.
Recommend workflow
- 先用pre-made estimator建立一个博学
- 然后建立overall pipeline,保证integrity和reliability
- 尝试其他的pre-made estimator
- 尝试自己建立的estimator
网友评论