Tengine笔记

作者: forest_66f5 | 来源:发表于2019-06-21 13:42 被阅读0次

Tengine笔记
Tengine限流
Ansible Role WEB 之【tengine】
Centos7安装Tengine
tengine 安装
Tegine install
tengine下载安装 centos7
tengine 源码安装、启动服务
推理引擎tengine编译部署及MobileNet_SSD推理测
Ubuntu 安装 Tengine

#安装

##install protobuf https://github.com/protocolbuffers/protobuf/blob/master/src/README.md

install openBlas

https://blog.csdn.net/sinat_24143931/article/details/78692622

git clone git://github.com/xianyi/OpenBLAS

sudo make FC=gfortran （如果没有安装gfortran,执行sudo apt-get install gfortran） sudo make install #添加库安装路径/opt/OpenBLAS/lib

到/etc/ld.so.conf, 运行ldconfig ----------------------没用

ln /opt/OpenBLAS/lib/libopenblas_haswellp-r0.3.7.dev.so /usr/lib/libopenblas.so --------------有用

cp /opt/OpenBLAS/include/* /usr/local/include/

ARM Tengine install

https://github.com/OAID/Tengine/blob/master/doc/install.md

cd example

mkdir build

替换CMakeList.txt, cmake,

make

/data/edge/tengine/tengine/examples/build# cmake .. -DTENGINE_DIR=/data/edge/tengine/tengine -DCMAKE_BUILD_TYPE=Debug --trace

cd ../ make

doc: https://github.com/OAID/Tengine/blob/master/doc/operator_ir.md

###Tengine is composed of six modules:

core /operator/serializer/executor/driver/wrapper.

###core： provides the basic components and functionalities of the system. operator： defines the schema of operators, such as convolution, relu, pooling, etc. al. Here is the current support operator list.

###serializer： is to load the saved model. The serializer framework is extensible to support different format, including the customized one. Caffe/ONNX/Tensorflow/MXNet and Tengine models can be loaded directly by Tengine.

###executor： implements the code to run graph and operators. Current version provides a highly optimized implementation for multi A72 cores.

###driver： is the adapter of real H/W and provides service to device executor by HAL API. It is possible for single driver to create multiple devices.

###wrapper： provides the wrapper of APIs for different frameworks. Both Caffe API wrapper and Tensorflow API wrapper work now.

##Support Operator Lists

BatchNorm Concat ConstOp Convolution Deconvolution Detection_output Dropout Eltwise Flatten Fully_connected Input_op LRN LSTM Normalize Permute Pooling Priorbox PReLu Region Resize Reorg Reshape ReLu RPN Roi_pooling Scale Slice Softmax

###build example

export TENGINE_LOG_LEVEL=7

export DEBUG_G=1

/data/edge/tengine/tengine/examples/mtcnn/build# cmake .. -DTENGINE_DIR=/data/edge/tengine/tengine --trace

add_definitions(-DCONFIG_LEGACY_API)

add_definitions(-Wno-unused-command-line-argument)

add_definitions(-Wall) add_definitions(-fPIC) add_definitions(-g)

add_definitions(-O3) add_definitions(-funroll-loops)

add_definitions(-Wno-overloaded-virtual)

add_definitions(-Wno-deprecated-register) add_compile_options($<$<COMPILE_LANGUAGE:CXX>:-std=c++11>)

###4种卷积实现：使用优先级来选择

####openBlas版： ./executor/operator/common/blas/conv_2d_blas.cpp: if(!NodeOpsRegistryManager::RegisterOPImplementor("common", "Convolution", ConvolutionImpl::SelectFunc,

####汇编通用卷积版： ./executor/operator/arm64/conv/conv_2d_fast.cpp: NodeOpsRegistryManager::RegisterOPImplementor("arm64", "Convolution", conv_fast::SelectFunc,

####汇编深度分离卷积版： ./executor/operator/arm64/conv/conv_2d_dw.cpp: NodeOpsRegistryManager::RegisterOPImplementor("arm64", "Convolution", conv_2d_dw::SelectFunc,

####CPU C语言版： ./executor/operator/ref/ref_convolution.cpp: NodeOpsRegistryManager::RegisterOPImplementor(REF_REGISTRY_NAME, "Convolution", RefConvolutionOps::SelectFunc,

##前端其他模型格式导入 The serializer module loads the whole model file stored in disk, and creates a Tengine in-memory IR, which is StaticGraph. （导入其他格式的模型文件，比如tensorflow的，变成内部内存中的格式StaticGraph） The serializer module also can store the StaticGraph into disk in the specific format. However, current version of this document describes the loading process, which is more important than the storing process.

################################ #######前端序列化############### ################################

Load Interface unsigned int GetFileNum(void); --- 返回模型共有几个文件 bool LoadModel(const std::vector<std::string>& file_list, StaticGraph * static_graph); ----把模型文件转成StaticGraph ###gatherV2 ops gather用于获取tensor中某几位的量 https://baijiahao.baidu.com/s?id=1602069319915188130&wfr=spider&for=pc def embedding_lookup( params, ids, partition_strategy="mod", name=None, validate_indices=True, # pylint: disable=unused-argument max_norm=None): """Looks up `ids` in a list of embedding tensors. This function is used to perform parallel lookups on the list of tensors in `params`. It is a generalization of @{tf.gather}, where `params` is interpreted as a partitioning of a large embedding tensor. `params` may be a `PartitionedVariable` as returned by using `tf.get_variable()` with a partitioner. If `len(params) > 1`, each element `id` of `ids` is partitioned between the elements of `params` according to the `partition_strategy`. In all strategies, if the id space does not evenly divide the number of partitions, each of the first `(max_id + 1) % len(params)` partitions will be assigned one more id. If `partition_strategy` is `"mod"`, we assign each id to partition `p = id % len(params)`. For instance, 13 ids are split across 5 partitions as: `[[0, 5, 10], [1, 6, 11], [2, 7, 12], [3, 8], [4, 9]]` If `partition_strategy` is `"div"`, we assign ids to partitions in a contiguous manner. In this case, 13 ids are split across 5 partitions as: `[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10], [11, 12]]` The results of the lookup are concatenated into a dense tensor. The returned tensor has shape `shape(ids) + shape(params)[1:]`. Args: params: A single tensor representing the complete embedding tensor, or a list of P tensors all of same shape except for the first dimension, representing sharded embedding tensors. Alternatively, a `PartitionedVariable`, created by partitioning along dimension 0. Each element must be appropriately sized for the given `partition_strategy`. ids: A `Tensor` with type `int32` or `int64` containing the ids to be looked up in `params`. partition_strategy: A string specifying the partitioning strategy, relevant if `len(params) > 1`. Currently `"div"` and `"mod"` are supported. Default is `"mod"`. name: A name for the operation (optional). validate_indices: DEPRECATED. If this operation is assigned to CPU, values in `indices` are always validated to be within range. If assigned to GPU, out-of-bound indices result in safe but unspecified behavior, which may include raising an error. max_norm: If provided, embedding values are l2-normalized to the value of max_norm. Returns: A `Tensor` with the same type as the tensors in `params`. Raises: ValueError: If `params` is empty. """ return _embedding_lookup_and_transform( params=params, ids=ids, partition_strategy=partition_strategy, name=name, max_norm=max_norm, transform_fn=None)

### 现在支持的tensorflow ops:

p_tf->RegisterOpLoadMethod("AvgPool", op_load_t(LoadPool)); p_tf->RegisterOpLoadMethod("MaxPool", op_load_t(LoadPool)); p_tf->RegisterOpLoadMethod("Conv2D", op_load_t(LoadConv2D)); p_tf->RegisterOpLoadMethod("DepthwiseConv2dNative", op_load_t(LoadConv2D)); p_tf->RegisterOpLoadMethod("FusedBatchNorm", op_load_t(LoadBatchNorm)); p_tf->RegisterOpLoadMethod("Relu6", op_load_t(LoadRelu6)); p_tf->RegisterOpLoadMethod("Relu", op_load_t(LoadRelu)); p_tf->RegisterOpLoadMethod("Softmax", op_load_t(LoadSoftmax)); p_tf->RegisterOpLoadMethod("ConcatV2", op_load_t(LoadConcat)); p_tf->RegisterOpLoadMethod("Add", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Sub", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Mul", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Minimum", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Rsqrt", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("ResizeNearestNeighbor", op_load_t(LoadResize)); p_tf->RegisterOpLoadMethod("ComposedBN", op_load_t(LoadComposedBN)); p_tf->RegisterOpLoadMethod("Reshape", op_load_t(LoadReshape)); p_tf->RegisterOpLoadMethod("MatMul", op_load_t(LoadGemm)); p_tf->RegisterOpLoadMethod("AddN", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("FIFOQueueV2", op_load_t(LoadFIFOQueue)); p_tf->RegisterOpLoadMethod("Mean", op_load_t(LoadMean)); p_tf->RegisterOpLoadMethod("DecodeWav", op_load_t(LoadGeneric)); p_tf->RegisterOpLoadMethod("AudioSpectrogram", op_load_t(LoadGeneric)); p_tf->RegisterOpLoadMethod("Mfcc", op_load_t(LoadGeneric)); p_tf->RegisterOpLoadMethod("LSTM", op_load_t(LoadLSTM)); p_tf->RegisterOpLoadMethod("RNN", op_load_t(LoadRNN)); p_tf->RegisterOpLoadMethod("GRU", op_load_t(LoadGRU)); p_tf->RegisterOpLoadMethod("StridedSlice", op_load_t(LoadStridedSlice));

网友评论

本文标题：Tengine笔记

本文链接：https://www.haomeiwen.com/subject/rhsyqctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Tengine笔记

#安装

##install protobuf https://github.com/protocolbuffers/protobuf/blob/master/src/README.md

install openBlas

https://blog.csdn.net/sinat_24143931/article/details/78692622

git clone git://github.com/xianyi/OpenBLAS

sudo make FC=gfortran （如果没有安装gfortran,执行sudo apt-get install gfortran） sudo make install #添加库安装路径/opt/OpenBLAS/lib

到/etc/ld.so.conf, 运行ldconfig ----------------------没用

ln /opt/OpenBLAS/lib/libopenblas_haswellp-r0.3.7.dev.so /usr/lib/libopenblas.so --------------有用

cp /opt/OpenBLAS/include/* /usr/local/include/

ARM Tengine install

https://github.com/OAID/Tengine/blob/master/doc/install.md

cd example

mkdir build

替换CMakeList.txt, cmake,

make

/data/edge/tengine/tengine/examples/build# cmake .. -DTENGINE_DIR=/data/edge/tengine/tengine -DCMAKE_BUILD_TYPE=Debug --trace

cd ../ make

doc: https://github.com/OAID/Tengine/blob/master/doc/operator_ir.md

###Tengine is composed of six modules:

core /operator/serializer/executor/driver/wrapper.

###core： provides the basic components and functionalities of the system. operator： defines the schema of operators, such as convolution, relu, pooling, etc. al. Here is the current support operator list.

###serializer： is to load the saved model. The serializer framework is extensible to support different format, including the customized one. Caffe/ONNX/Tensorflow/MXNet and Tengine models can be loaded directly by Tengine.

###executor： implements the code to run graph and operators. Current version provides a highly optimized implementation for multi A72 cores.

###driver： is the adapter of real H/W and provides service to device executor by HAL API. It is possible for single driver to create multiple devices.

###wrapper： provides the wrapper of APIs for different frameworks. Both Caffe API wrapper and Tensorflow API wrapper work now.

##Support Operator Lists

BatchNorm Concat ConstOp Convolution Deconvolution Detection_output Dropout Eltwise Flatten Fully_connected Input_op LRN LSTM Normalize Permute Pooling Priorbox PReLu Region Resize Reorg Reshape ReLu RPN Roi_pooling Scale Slice Softmax

###build example

export TENGINE_LOG_LEVEL=7

export DEBUG_G=1

/data/edge/tengine/tengine/examples/mtcnn/build# cmake .. -DTENGINE_DIR=/data/edge/tengine/tengine --trace

add_definitions(-DCONFIG_LEGACY_API)

add_definitions(-Wno-unused-command-line-argument)

add_definitions(-Wall) add_definitions(-fPIC) add_definitions(-g)

add_definitions(-O3) add_definitions(-funroll-loops)

add_definitions(-Wno-overloaded-virtual)

add_definitions(-Wno-deprecated-register) add_compile_options($<$<COMPILE_LANGUAGE:CXX>:-std=c++11>)

###4种卷积实现：使用优先级来选择

####openBlas版： ./executor/operator/common/blas/conv_2d_blas.cpp: if(!NodeOpsRegistryManager::RegisterOPImplementor("common", "Convolution", ConvolutionImpl::SelectFunc,

####汇编通用卷积版： ./executor/operator/arm64/conv/conv_2d_fast.cpp: NodeOpsRegistryManager::RegisterOPImplementor("arm64", "Convolution", conv_fast::SelectFunc,

####汇编深度分离卷积版： ./executor/operator/arm64/conv/conv_2d_dw.cpp: NodeOpsRegistryManager::RegisterOPImplementor("arm64", "Convolution", conv_2d_dw::SelectFunc,

####CPU C语言版： ./executor/operator/ref/ref_convolution.cpp: NodeOpsRegistryManager::RegisterOPImplementor(REF_REGISTRY_NAME, "Convolution", RefConvolutionOps::SelectFunc,

################################ #######前端序列化############### ################################

### 现在支持的tensorflow ops:

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读