TensorFlow服务是一个用于服务机器学习模型的开源软件库。它处理机器学习的推断方面,在培训和管理他们的生命周期后采取模型,通过高性能,引用计数的查找表为客户端提供版本化访问。
可以同时提供多个模型,或者实际上是同一模型的多个版本。这种灵活性有助于新版本,非原子性迁移客户端到新模型或版本,以及A / B测试实验模型。
主要用途是高性能生产服务,但是同样的服务基础设施也可以用于批量处理(例如地图缩减)作业以预先计算推理结果或分析模型性能。在这两种情况下,GPU可以显着增加推理吞吐量。 TensorFlow服务提供了一个调度程序,可以将单个推理请求分组批量,以便在GPU上进行联合执行,并配置延迟控制。
TensorFlow服务对TensorFlow模型(自然)具有开箱即用的支持,但它的核心是管理任意版本的项目(servables),并将其传递给其本机API。除了经过训练的TensorFlow模型之外,服务器还可以包括推理所需的其他资产,如嵌入,词汇和特征转换配置,甚至非基于TensorFlow的机器学习模型。
TensorFlow Serving is an open-source software library for serving machine learning models. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table.
Multiple models, or indeed multiple versions of the same model, can be served simultaneously. This flexibility facilitates canarying new versions, non-atomically migrating clients to new models or versions, and A/B testing experimental models.
The primary use-case is high-performance production serving, but the same serving infrastructure can also be used in bulk-processing (e.g. map-reduce) jobs to pre-compute inference results or analyze model performance. In both scenarios, GPUs can substantially increase inference throughput. TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU, with configurable latency controls.
TensorFlow Serving has out-of-the-box support for TensorFlow models (naturally), but at its core it manages arbitrary versioned items (servables) with pass-through to their native APIs. In addition to trained TensorFlow models, servables can include other assets needed for inference such as embeddings, vocabularies and feature transformation configs, or even non-TensorFlow-based machine learning models.
更多教程:
http://www.tensorflownews.com/2017/08/09/google-tensorflow-serving-library/
网友评论