- tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
Jetson forum topic refer to
Keras-Yolo3 验证时出错!
原因:
TensorFlow 运行需要内存较大,需要为TF 分配较大内存
解决:
方案1、Python 代码 yolo_video.py 添加 "config.gpu_options.allow_growth = True"
方案2、释放当前Ubuntu内存,$echo 3 > /proc/sys/vm/drop_caches
nvidia@tegra-ubuntu:~/work/d.keras/keras-yolo3$ python3 yolo_video.py --input ../../algorithm/alexey_darknet/data/Autobahn.mp4
Using TensorFlow backend.
2019-02-21 03:57:43.908835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-02-21 03:57:43.909000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.85GiB
2019-02-21 03:57:43.909055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-02-21 03:57:44.775649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-21 03:57:44.775753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-02-21 03:57:44.775792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-02-21 03:57:44.775971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4458 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-02-21 03:57:45.287109: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***
2019-02-21 03:57:45.526617: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 70, in generate
self.yolo_model = load_model(model_path, compile=False)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/saving.py", line 458, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/__init__.py", line 55, in deserialize
printable_module_name='layer')
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
list(custom_objects.items())))
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/network.py", line 1032, in from_config
process_node(layer, node_data)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 185, in call
epsilon=self.epsilon)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
gpus_available = len(_get_available_gpus()) > 0
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
_LOCAL_DEVICES = get_session().list_devices()
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 199, in get_session
[tf.is_variable_initialized(v) for v in candidate_vars])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "yolo_video.py", line 75, in <module>
detect_video(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output)
File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 45, in __init__
self.boxes, self.scores, self.classes = self.generate()
File "/home/nvidia/work/d.keras/keras-yolo3/yolo.py", line 73, in generate
if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/model.py", line 72, in yolo_body
darknet = Model(inputs, darknet_body(inputs))
File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/model.py", line 48, in darknet_body
x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/utils.py", line 16, in <lambda>
return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
File "/home/nvidia/work/d.keras/keras-yolo3/yolo3/utils.py", line 16, in <lambda>
return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 185, in call
epsilon=self.epsilon)
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
gpus_available = len(_get_available_gpus()) > 0
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
_LOCAL_DEVICES = get_session().list_devices()
File "/home/nvidia/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 199, in get_session
[tf.is_variable_initialized(v) for v in candidate_vars])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
nvidia@tegra-ubuntu:~/work/d.keras/keras-yolo3$ vim yolo_video.py
> import tensorflow as tf
> from keras.backend.tensorflow_backend import set_session
> config = tf.ConfigProto()
> config.gpu_options.allow_growth = True
> sess = tf.Session(config=config)
> set_session(sess)
运行环境及结果:
NVIDIA Jetson TX2
Python 3.5.2 + Keras 2.2.4 + tensorflow 1.9.0
Keras-Yolov3 运行并分析视频Autobahn.mp4,但帧率非常低,仅有1-2fps
-
"ARM64 does not support NUMA",但不影响运行
TX2的ARM64架构不支持NUMA,运行时会提示:
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
(TensorFlow 分为有GPU和无GPU版本) -
"Dst tensor is not initialized."
原因: 内存不足
解决:
方案1、释放Ubuntu系统内存(root) $echo 3 > /proc/sys/vm/drop_caches
方案2、Python 代码添加 "config.gpu_options.allow_growth = True"
2019-02-22 07:52:01.459910: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 142.79MiB
2019-02-22 07:52:01.459965: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 513175552
InUse: 149731072
MaxInUse: 163384064
NumAllocs: 64
MaxAllocSize: 67108864
2019-02-22 07:52:01.460006: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****xx***__****__________________****************__**__***********************______________________
Traceback (most recent call last):
File "scripts/models_to_frozen_graphs.py", line 64, in <module>
sess=tf_sess
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1752, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: save/RestoreV2/_55 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_60_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
网友评论