美文网首页NVIDIA Jetson working
TX2 using TensorFlow buglist

TX2 using TensorFlow buglist

作者: 童年雅趣 | 来源:发表于2019-02-21 12:01 被阅读209次
    1. tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
      Jetson forum topic refer to
      Keras-Yolo3 验证时出错!
      原因:
      TensorFlow 运行需要内存较大,需要为TF 分配较大内存
      解决:
      方案1、Python 代码 yolo_video.py 添加 "config.gpu_options.allow_growth = True"
      方案2、释放当前Ubuntu内存,$echo 3 > /proc/sys/vm/drop_caches
    nvidia@tegra-ubuntu:~/work/d.keras/keras-yolo3$ python3 yolo_video.py --input ../../algorithm/alexey_darknet/data/Autobahn.mp4
    
    Using TensorFlow backend.
    2019-02-21 03:57:43.908835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
    2019-02-21 03:57:43.909000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
    name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
    pciBusID: 0000:00:00.0
    totalMemory: 7.67GiB freeMemory: 4.85GiB
    2019-02-21 03:57:43.909055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
    2019-02-21 03:57:44.775649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-02-21 03:57:44.775753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
    2019-02-21 03:57:44.775792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
    2019-02-21 03:57:44.775971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4458 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
    2019-02-21 03:57:45.287109: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
        stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
        stream_executor::StreamExecutor::SynchronizeAllActivity()
        tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
    *** End stack trace ***
    
    2019-02-21 03:57:45.526617: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
        stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
        stream_executor::StreamExecutor::SynchronizeAllActivity()
        tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
    *** End stack trace ***
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
        return fn(*args)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
        options, feed_dict, fetch_list, target_list, run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
        run_metadata)
    tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 70, in generate
        self.yolo_model = load_model(model_path, compile=False)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 419, in load_model
        model = _deserialize_model(f, custom_objects, compile)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
        model = model_from_config(model_config, custom_objects=custom_objects)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 458, in model_from_config
        return deserialize(config, custom_objects=custom_objects)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/__init__.py", line 55, in deserialize
        printable_module_name='layer')
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
        list(custom_objects.items())))
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/network.py", line 1032, in from_config
        process_node(layer, node_data)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/network.py", line 991, in process_node
        layer(unpack_singleton(input_tensors), **kwargs)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
        output = self.call(inputs, **kwargs)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 185, in call
        epsilon=self.epsilon)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
        if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
        gpus_available = len(_get_available_gpus()) > 0
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
        _LOCAL_DEVICES = get_session().list_devices()
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 199, in get_session
        [tf.is_variable_initialized(v) for v in candidate_vars])
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
        run_metadata_ptr)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
        feed_dict_tensor, options, run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
        run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
        return fn(*args)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
        options, feed_dict, fetch_list, target_list, run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
        run_metadata)
    tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "yolo_video.py", line 75, in <module>
        detect_video(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output)
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 45, in __init__
        self.boxes, self.scores, self.classes = self.generate()
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 73, in generate
        if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/model.py", line 72, in yolo_body
        darknet = Model(inputs, darknet_body(inputs))
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/model.py", line 48, in darknet_body
        x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/utils.py", line 16, in <lambda>
        return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
      File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/utils.py", line 16, in <lambda>
        return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
        output = self.call(inputs, **kwargs)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 185, in call
        epsilon=self.epsilon)
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
        if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
        gpus_available = len(_get_available_gpus()) > 0
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
        _LOCAL_DEVICES = get_session().list_devices()
      File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 199, in get_session
        [tf.is_variable_initialized(v) for v in candidate_vars])
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
        run_metadata_ptr)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
        feed_dict_tensor, options, run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
        run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
    
    
    nvidia@tegra-ubuntu:~/work/d.keras/keras-yolo3$ vim yolo_video.py 
    > import tensorflow as tf
    > from keras.backend.tensorflow_backend import set_session
    > config = tf.ConfigProto()
    > config.gpu_options.allow_growth = True
    > sess = tf.Session(config=config)
    > set_session(sess)
    

    运行环境及结果:
    NVIDIA Jetson TX2
    Python 3.5.2 + Keras 2.2.4 + tensorflow 1.9.0
    Keras-Yolov3 运行并分析视频Autobahn.mp4,但帧率非常低,仅有1-2fps

    1. "ARM64 does not support NUMA",但不影响运行
      TX2的ARM64架构不支持NUMA,运行时会提示:
      tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
      (TensorFlow 分为有GPU和无GPU版本)

    2. "Dst tensor is not initialized."
      原因: 内存不足
      解决:
      方案1、释放Ubuntu系统内存(root) $echo 3 > /proc/sys/vm/drop_caches
      方案2、Python 代码添加 "config.gpu_options.allow_growth = True"

    2019-02-22 07:52:01.459910: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 142.79MiB
    2019-02-22 07:52:01.459965: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats: 
    Limit:                   513175552
    InUse:                   149731072
    MaxInUse:                163384064
    NumAllocs:                      64
    MaxAllocSize:             67108864
    
    2019-02-22 07:52:01.460006: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****xx***__****__________________****************__**__***********************______________________
    Traceback (most recent call last):
      File "scripts/models_to_frozen_graphs.py", line 64, in <module>
        sess=tf_sess
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1752, in restore
        {self.saver_def.filename_tensor_name: save_path})
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
        run_metadata_ptr)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
        feed_dict_tensor, options, run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
        run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
         [[Node: save/RestoreV2/_55 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_60_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
    

    相关文章

      网友评论

        本文标题:TX2 using TensorFlow buglist

        本文链接:https://www.haomeiwen.com/subject/leyhyqtx.html