主要用于预处理和获取RoI_Mask中的box切图操作
# box_ids用于指定图像和boxes的对应编号, 是固定用法
box_ids = tf.range(0, tf.shape(roi_masks)[0])
# 每个roi对应原图mask的MINI_MASK_SHAPE是[56, 56], 切出来roi的mask后缩放到MASK_SHAPE[28, 28]
# boxes和roi_masks的[0]维是一一对应的, 因为每一个roi找一个mask, 所以存在重复(tf.gather()允许重复)
masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32), boxes,
box_ids,
config.MASK_SHAPE)
# Remove the extra dimension from masks. 在此之前需要扩充一维做切图
masks = tf.squeeze(masks, axis=3)
# Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with
# binary cross entropy loss, 浮点型四舍五入成0.或1., 注意仍是float32不是int32
masks = tf.round(masks)
函数说明如下:
- 忽略宽高比
- 当boxes = [[0, 0, 1, 1]]即保留整幅图, 则等价于tf.image.resize_bilinear()或最近邻插值, 直接resize到目标mask大小(四角对齐).
- 当boxes的y1>y2时看作是上下翻转变换, x1>x2时看做左右翻转变换
- 反归一化坐标
y * (image_height - 1)
因为从0数, 要注意边界条件 - 对于boxes中大于1的部分使用extrapolation_value参数, 默认为0, 即黑边
@tf_export('image.crop_and_resize')
def crop_and_resize(image, boxes, box_ind, crop_size, method="bilinear", extrapolation_value=0, name=None):
r"""Extracts crops from the input image tensor and resizes them.
Extracts crops from the input image tensor and resizes them using bilinear
sampling or nearest neighbor sampling (possibly with aspect ratio change) to a
common output size specified by `crop_size`. This is more general than the
`crop_to_bounding_box` op which extracts a fixed size slice from the input image
and does not allow resizing or aspect ratio change.
Returns a tensor with `crops` from the input `image` at positions defined at the
bounding box locations in `boxes`. The cropped boxes are all resized (with
bilinear or nearest neighbor interpolation) to a fixed
`size = [crop_height, crop_width]`. The result is a 4-D tensor
`[num_boxes, crop_height, crop_width, depth]`. The resizing is corner aligned.
In particular, if `boxes = [[0, 0, 1, 1]]`, the method will give identical
results to using `tf.image.resize_bilinear()` or
`tf.image.resize_nearest_neighbor()`(depends on the `method` argument) with
`align_corners=True`.
Args:
image: A `Tensor`. Must be one of the following types: `uint8`, `uint16`, `int8`, `int16`, `int32`, `int64`, `half`, `float32`, `float64`.
A 4-D tensor of shape `[batch, image_height, image_width, depth]`.
Both `image_height` and `image_width` need to be positive.
boxes: A `Tensor` of type `float32`.
A 2-D tensor of shape `[num_boxes, 4]`. The `i`-th row of the tensor
specifies the coordinates of a box in the `box_ind[i]` image and is specified
in normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value of
`y` is mapped to the image coordinate at `y * (image_height - 1)`, so as the
`[0, 1]` interval of normalized image height is mapped to
`[0, image_height - 1]` in image height coordinates. We do allow `y1` > `y2`, in
which case the sampled crop is an up-down flipped version of the original
image. The width dimension is treated similarly. Normalized coordinates
outside the `[0, 1]` range are allowed, in which case we use
`extrapolation_value` to extrapolate the input image values.
box_ind: A `Tensor` of type `int32`.
A 1-D tensor of shape `[num_boxes]` with int32 values in `[0, batch)`.
The value of `box_ind[i]` specifies the image that the `i`-th box refers to.
crop_size: A `Tensor` of type `int32`.
A 1-D tensor of 2 elements, `size = [crop_height, crop_width]`. All
cropped image patches are resized to this size. The aspect ratio of the image
content is not preserved. Both `crop_height` and `crop_width` need to be
positive.
method: An optional `string` from: `"bilinear", "nearest"`. Defaults to `"bilinear"`.
A string specifying the sampling method for resizing. It can be either
`"bilinear"` or `"nearest"` and default to `"bilinear"`. Currently two sampling
methods are supported: Bilinear and Nearest Neighbor.
extrapolation_value: An optional `float`. Defaults to `0`.
Value used for extrapolation, when applicable.
name: A name for the operation (optional).
Returns:
A `Tensor` of type `float32`.
"""
网友评论