美文网首页
[Windows] mask r-cnn 填坑笔记

[Windows] mask r-cnn 填坑笔记

作者: sprainkle | 来源:发表于2019-02-24 21:44 被阅读14次

    注:这里介绍的问题,是在Windows环境下可能出现的错误,在其他环境暂不清楚。因为之前在Mac OS下是没出现类似的问题,可能是因为Mac OS安装的是没有GPU加速的tensorflow

    1. 显存不足导致的各种问题

    1.1 failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

    image.png
    2019-02-24 20:34:03.563275: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
    2019-02-24 20:34:03.563666: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
    2019-02-24 20:34:04.486945: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
    2019-02-24 20:34:04.487252: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:389] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
    2019-02-24 20:34:04.487533: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
    2019-02-24 20:34:04.487769: F C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
    

    解决方案

    # 全局设置
    os.environ['KMP_DUPLICATE_LIB_OK']='True'
    

    如:


    image.png

    1.2 OOM when allocating tensor with shape

    2019-02-24 20:41:58.028372: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
    2019-02-24 20:42:12.927144: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 256.00MiB.  Current allocation summary follows.
    2019-02-24 20:42:12.927448: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:627] Bin (256):  Total Chunks: 253, Chunks in use: 245. 63.3KiB allocated for chunks. 61.3KiB in use in bin. 11.1KiB client-requested in use in bin.
    2019-02-24 20:42:12.927720: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:627] Bin (512):  Total Chunks: 44, Chunks in use: 42. 22.5KiB allocated for chunks. 21.0KiB in use in bin. 21.0KiB client-requested in use in bin.
    ... 省略一大坨
    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2,256,256,512]
         [[Node: training/SGD/gradients/rpn_model/rpn_bbox_pred/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, _class=["loc:@rpn_model/rpn_bbox_pred/convolution"], data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/rpn_model/rpn_bbox_pred/convolution_grad/ShapeN, rpn_bbox_pred/kernel/read, training/SGD/gradients/rpn_model/lambda_3/Reshape_grad/Reshape)]]
         [[Node: training/SGD/gradients/mrcnn_mask_conv1/convolution_grad/Conv2DBackpropInput/_4653 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7160_training/SGD/gradients/mrcnn_mask_conv1/convolution_grad/Conv2DBackpropInput", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
    
    ran out of memory
    OOM when allocating tensor with shape
    报这个错其实是因为显卡内存(显存)不足导致的,解决的办法有:
    • 降低每个GPU处理的图片数量
    • 重置输入图片尺寸,即通过减小图片的大小来减少对显存的消耗

    解决方案

    class ShapeConfig(Config):
        """Configuration for training on the toy  dataset.
        Derives from the base Config class and overrides some values.
        """
        # Give the configuration a recognizable name
        NAME = "shape"
    
        # We use a GPU with 12GB memory, which can fit two images.
        # Adjust down if you use a smaller GPU.
        IMAGES_PER_GPU = 1
    
        # Input image resizing
        IMAGE_MIN_DIM = IMAGE_MAX_DIM = 128
        ...
    

    降低其中的某一个配置,一般都可以达到效果,当然也可以两个都设置。

    下面为这两个配置的说明:

        # Number of images to train with on each GPU. A 12GB GPU can typically
        # handle 2 images of 1024x1024px.
        # Adjust based on your GPU memory and image sizes. Use the highest
        # number that your GPU can handle for best performance.
        IMAGES_PER_GPU = 2
    
        # Input image resizing
        # Generally, use the "square" resizing mode for training and predicting
        # and it should work well in most cases. In this mode, images are scaled
        # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
        # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
        # padded with zeros to make it a square so multiple images can be put
        # in one batch.
        # Available resizing modes:
        # none:   No resizing or padding. Return the image unchanged.
        # square: Resize and pad with zeros to get a square image
        #         of size [max_dim, max_dim].
        # pad64:  Pads width and height with zeros to make them multiples of 64.
        #         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
        #         up before padding. IMAGE_MAX_DIM is ignored in this mode.
        #         The multiple of 64 is needed to ensure smooth scaling of feature
        #         maps up and down the 6 levels of the FPN pyramid (2**6=64).
        # crop:   Picks random crops from the image. First, scales the image based
        #         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
        #         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
        #         IMAGE_MAX_DIM is not used in this mode.
        IMAGE_RESIZE_MODE = "square"
        IMAGE_MIN_DIM = 800
        IMAGE_MAX_DIM = 1024
    

    参考:
    mask_rcnn代码解析config.py
    OOM when allocating tensor

    1.3 另一种 failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

    因暂时没办法重现,但真的遇到这个错误,所以没有相应的截图,只拷贝了别人的日志打印,类似如下:

    E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 
    W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support 
    E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 
    W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
    

    该错误同样是显存不足导致的。

    解决方案:


    image.png

    参考:
    https://github.com/tensorflow/tensorflow/issues/7072#issuecomment-422488354

    2. unexpected keyword argument 'keep_dims'

    tensorflow配置Mask-RCNN报错:tf.reduce_mean got an unexpected keyword argument 'keep_dims'

    相关文章

      网友评论

          本文标题:[Windows] mask r-cnn 填坑笔记

          本文链接:https://www.haomeiwen.com/subject/ywnbyqtx.html