美文网首页
Tengine笔记

Tengine笔记

作者: forest_66f5 | 来源:发表于2019-06-21 13:42 被阅读0次

    #安装

    ##install protobuf https://github.com/protocolbuffers/protobuf/blob/master/src/README.md 

    install openBlas 

    https://blog.csdn.net/sinat_24143931/article/details/78692622 

    git clone git://github.com/xianyi/OpenBLAS 

    sudo make FC=gfortran (如果没有安装gfortran,执行sudo apt-get install gfortran) sudo make install #添加库安装路径/opt/OpenBLAS/lib

    到/etc/ld.so.conf, 运行ldconfig ----------------------没用 

    ln /opt/OpenBLAS/lib/libopenblas_haswellp-r0.3.7.dev.so /usr/lib/libopenblas.so --------------有用 

    cp /opt/OpenBLAS/include/* /usr/local/include/ 

    ARM Tengine install 

    https://github.com/OAID/Tengine/blob/master/doc/install.md 

    cd example 

    mkdir build 

    替换CMakeList.txt, cmake, 

    make 

    /data/edge/tengine/tengine/examples/build# cmake .. -DTENGINE_DIR=/data/edge/tengine/tengine -DCMAKE_BUILD_TYPE=Debug --trace 

    cd ../ make 

    doc: https://github.com/OAID/Tengine/blob/master/doc/operator_ir.md 

    ###Tengine is composed of six modules: 

    core /operator/serializer/executor/driver/wrapper. 

    ###core: provides the basic components and functionalities of the system. operator: defines the schema of operators, such as convolution, relu, pooling, etc. al. Here is the current support operator list. 

    ###serializer: is to load the saved model. The serializer framework is extensible to support different format, including the customized one. Caffe/ONNX/Tensorflow/MXNet and Tengine models can be loaded directly by Tengine. 

    ###executor: implements the code to run graph and operators. Current version provides a highly optimized implementation for multi A72 cores. 

    ###driver: is the adapter of real H/W and provides service to device executor by HAL API. It is possible for single driver to create multiple devices. 

    ###wrapper: provides the wrapper of APIs for different frameworks. Both Caffe API wrapper and Tensorflow API wrapper work now.  


    ##Support Operator Lists 

    BatchNorm Concat ConstOp Convolution Deconvolution Detection_output Dropout Eltwise Flatten Fully_connected Input_op LRN LSTM Normalize Permute Pooling Priorbox PReLu Region Resize Reorg Reshape ReLu RPN Roi_pooling Scale Slice Softmax 

    ###build example 

    export TENGINE_LOG_LEVEL=7 

    export DEBUG_G=1 

    /data/edge/tengine/tengine/examples/mtcnn/build# cmake .. -DTENGINE_DIR=/data/edge/tengine/tengine --trace  


    add_definitions(-DCONFIG_LEGACY_API) 

    add_definitions(-Wno-unused-command-line-argument) 

    add_definitions(-Wall) add_definitions(-fPIC) add_definitions(-g) 

    add_definitions(-O3) add_definitions(-funroll-loops) 

    add_definitions(-Wno-overloaded-virtual) 

    add_definitions(-Wno-deprecated-register) add_compile_options($<$<COMPILE_LANGUAGE:CXX>:-std=c++11>) 


    ###4种卷积实现:使用优先级来选择 

    ####openBlas版: ./executor/operator/common/blas/conv_2d_blas.cpp: if(!NodeOpsRegistryManager::RegisterOPImplementor("common", "Convolution", ConvolutionImpl::SelectFunc, 

    ####汇编通用卷积版: ./executor/operator/arm64/conv/conv_2d_fast.cpp: NodeOpsRegistryManager::RegisterOPImplementor("arm64", "Convolution", conv_fast::SelectFunc, 

    ####汇编深度分离卷积版: ./executor/operator/arm64/conv/conv_2d_dw.cpp: NodeOpsRegistryManager::RegisterOPImplementor("arm64", "Convolution", conv_2d_dw::SelectFunc, 

    ####CPU C语言版: ./executor/operator/ref/ref_convolution.cpp: NodeOpsRegistryManager::RegisterOPImplementor(REF_REGISTRY_NAME, "Convolution", RefConvolutionOps::SelectFunc, 


    ##前端其他模型格式导入 The serializer module loads the whole model file stored in disk, and creates a Tengine in-memory IR, which is StaticGraph. (导入其他格式的模型文件,比如tensorflow的,变成内部内存中的格式StaticGraph) The serializer module also can store the StaticGraph into disk in the specific format. However, current version of this document describes the loading process, which is more important than the storing process. 

     ################################ #######前端序列化############### ################################ 

    Load Interface unsigned int GetFileNum(void); --- 返回模型共有几个文件 bool LoadModel(const std::vector<std::string>& file_list, StaticGraph * static_graph); ----把模型文件转成StaticGraph ###gatherV2 ops gather用于获取tensor中某几位的量 https://baijiahao.baidu.com/s?id=1602069319915188130&wfr=spider&for=pc def embedding_lookup( params, ids, partition_strategy="mod", name=None, validate_indices=True, # pylint: disable=unused-argument max_norm=None): """Looks up `ids` in a list of embedding tensors. This function is used to perform parallel lookups on the list of tensors in `params`. It is a generalization of @{tf.gather}, where `params` is interpreted as a partitioning of a large embedding tensor. `params` may be a `PartitionedVariable` as returned by using `tf.get_variable()` with a partitioner. If `len(params) > 1`, each element `id` of `ids` is partitioned between the elements of `params` according to the `partition_strategy`. In all strategies, if the id space does not evenly divide the number of partitions, each of the first `(max_id + 1) % len(params)` partitions will be assigned one more id. If `partition_strategy` is `"mod"`, we assign each id to partition `p = id % len(params)`. For instance, 13 ids are split across 5 partitions as: `[[0, 5, 10], [1, 6, 11], [2, 7, 12], [3, 8], [4, 9]]` If `partition_strategy` is `"div"`, we assign ids to partitions in a contiguous manner. In this case, 13 ids are split across 5 partitions as: `[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10], [11, 12]]` The results of the lookup are concatenated into a dense tensor. The returned tensor has shape `shape(ids) + shape(params)[1:]`. Args: params: A single tensor representing the complete embedding tensor, or a list of P tensors all of same shape except for the first dimension, representing sharded embedding tensors. Alternatively, a `PartitionedVariable`, created by partitioning along dimension 0. Each element must be appropriately sized for the given `partition_strategy`. ids: A `Tensor` with type `int32` or `int64` containing the ids to be looked up in `params`. partition_strategy: A string specifying the partitioning strategy, relevant if `len(params) > 1`. Currently `"div"` and `"mod"` are supported. Default is `"mod"`. name: A name for the operation (optional). validate_indices: DEPRECATED. If this operation is assigned to CPU, values in `indices` are always validated to be within range. If assigned to GPU, out-of-bound indices result in safe but unspecified behavior, which may include raising an error. max_norm: If provided, embedding values are l2-normalized to the value of max_norm. Returns: A `Tensor` with the same type as the tensors in `params`. Raises: ValueError: If `params` is empty. """ return _embedding_lookup_and_transform( params=params, ids=ids, partition_strategy=partition_strategy, name=name, max_norm=max_norm, transform_fn=None) 


     ### 现在支持的tensorflow ops: 

    p_tf->RegisterOpLoadMethod("AvgPool", op_load_t(LoadPool)); p_tf->RegisterOpLoadMethod("MaxPool", op_load_t(LoadPool)); p_tf->RegisterOpLoadMethod("Conv2D", op_load_t(LoadConv2D)); p_tf->RegisterOpLoadMethod("DepthwiseConv2dNative", op_load_t(LoadConv2D)); p_tf->RegisterOpLoadMethod("FusedBatchNorm", op_load_t(LoadBatchNorm)); p_tf->RegisterOpLoadMethod("Relu6", op_load_t(LoadRelu6)); p_tf->RegisterOpLoadMethod("Relu", op_load_t(LoadRelu)); p_tf->RegisterOpLoadMethod("Softmax", op_load_t(LoadSoftmax)); p_tf->RegisterOpLoadMethod("ConcatV2", op_load_t(LoadConcat)); p_tf->RegisterOpLoadMethod("Add", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Sub", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Mul", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Minimum", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("Rsqrt", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("ResizeNearestNeighbor", op_load_t(LoadResize)); p_tf->RegisterOpLoadMethod("ComposedBN", op_load_t(LoadComposedBN)); p_tf->RegisterOpLoadMethod("Reshape", op_load_t(LoadReshape)); p_tf->RegisterOpLoadMethod("MatMul", op_load_t(LoadGemm)); p_tf->RegisterOpLoadMethod("AddN", op_load_t(LoadEltwise)); p_tf->RegisterOpLoadMethod("FIFOQueueV2", op_load_t(LoadFIFOQueue)); p_tf->RegisterOpLoadMethod("Mean", op_load_t(LoadMean)); p_tf->RegisterOpLoadMethod("DecodeWav", op_load_t(LoadGeneric)); p_tf->RegisterOpLoadMethod("AudioSpectrogram", op_load_t(LoadGeneric)); p_tf->RegisterOpLoadMethod("Mfcc", op_load_t(LoadGeneric)); p_tf->RegisterOpLoadMethod("LSTM", op_load_t(LoadLSTM)); p_tf->RegisterOpLoadMethod("RNN", op_load_t(LoadRNN)); p_tf->RegisterOpLoadMethod("GRU", op_load_t(LoadGRU)); p_tf->RegisterOpLoadMethod("StridedSlice", op_load_t(LoadStridedSlice));

    相关文章

      网友评论

          本文标题:Tengine笔记

          本文链接:https://www.haomeiwen.com/subject/rhsyqctx.html