视频数据处理相关工具

作者: 加油11dd23 | 来源:发表于2021-06-13 19:33 被阅读0次

pytorchvideo，简单总结一下。

一、现有视频处理工具支持的功能

1、视频预处理

（1）、视频解码

视频数据都是编码后的数据，为了得到原视频数据，我们首先需要将视频解码

（2）、将视频数据转换为tensor（包含归一化）

2、深度学习

（1）、加载深度学习模型

（2）、训练

（3）、预测

（4）、一些加速手段

（4）、部署模型

二、pytorchvideo模块

1. data

2. accelerator

3. layers

4. model

5. transforms

(1). data augmentation

(2). 从视频中采样帧做分析

(3). 调整大小

(4). 放缩视频

(5). 随即裁剪

(6)mix up

MixUp在分类任务中, 解决的问题主要是让模型更加鲁棒性的区分很相近的类别, 比如我们一个batch一个batch去训练的时候, 猫对应的label就是猫, 狗就是狗, 完全只靠看猫和狗来区分他们. 那么在实际情况中, 如果出现一个样本, 它长得既像猫又像狗, 但实际上这是一条狗, 网络可能就懵逼了.MixUp采用配对的方式进行训练, 通过混合两个甚至是多个样本的分布, 同时加上对应的标签来训练, 用论文原文的话来说就是:
CutMix就是把一个物体抠出来, 粘贴到另一张图上去.这个增强的效果怎么样呢?从分别的角度来说, 貌似对于分类有一些增长, 但是, 问题在于, 你CutMix的方式似乎有讲究, 图片中只是看到了猫头和狗尾巴, 万一反过来了呢? 还应该分类为什么?

(7)Converts a video from dtype uint8 to dtype float32.

6. accelerate

(1). 介绍

Basics of efficient blocks in PytorchVideo/Accelerator;
Design, train and deploy a model composed of efficient blocks for mobile CPU.
pytorchvideo/layers/accelerator/<target_device> (for simple layers)
pytorchvideo/models/accelerator/<target_device> (for complex modules such as residual block).

For a target device, we benchmark efficiency of basic network components and provide a collection of efficient blocks under Inferencing of a model built up with corresponding efficient blocks on target device is guranteed to be efficient.

Each efficient block module is an instance of nn.Module, and has two forms: original form (for training) and deploy form (for inference). When in original form, the efficient block module has exactly the same behavior as a corresponding vanilla nn.Module for both forward and backward operation. User can freely mix and match efficient blocks for the same target device and build up their own model. Once model is built and trained, user can convert each efficient block in model into deploy form. The conversion will do graph and kernel optimization on each efficient block, and efficient block in deploy form is arithmetically equivalent to original form but has much higher efficiency during inference.

(2). 建立模型

One conv3d head layer with 5x1x1 kernel followed by ReLU activation;
One residual block with squeeze-excite;
One average pool and fully connected layer as final output.

#############################
#1. First, let's import efficient blocks.
#############################

import torch.nn as nn
from pytorchvideo.layers.accelerator.mobile_cpu.activation_functions import (
    supported_act_functions,
)
from pytorchvideo.layers.accelerator.mobile_cpu.convolutions import (
    Conv3d5x1x1BnAct,
)
from pytorchvideo.models.accelerator.mobile_cpu.residual_blocks import (
    X3dBottleneckBlock,
)
from pytorchvideo.layers.accelerator.mobile_cpu.pool import AdaptiveAvgPool3dOutSize1
from pytorchvideo.layers.accelerator.mobile_cpu.fully_connected import FullyConnected

#####################################
#2. Then we can build a model using those efficient blocks.
#####################################3
class MyNet(nn.Module):
    def __init__(
        self,
        in_channel=3,  # input channel of first 5x1x1 layer
        residual_block_channel=24,  # input channel of residual block
        expansion_ratio=3, # expansion ratio of residual block
        num_classes=4, # final output classes
    ):
        super().__init__()
        # s1 - 5x1x1 conv3d layer
        self.s1 = Conv3d5x1x1BnAct(
            in_channel,
            residual_block_channel,
            bias=False,
            groups=1,
            use_bn=False,
        )
        # s2 - residual block
        mid_channel = int(residual_block_channel * expansion_ratio)
        self.s2 = X3dBottleneckBlock(
                in_channels=residual_block_channel,
                mid_channels=mid_channel,
                out_channels=residual_block_channel,
                use_residual=True,
                spatial_stride=1,
                se_ratio=0.0625,
                act_functions=("relu", "swish", "relu"),
                use_bn=(True, True, True),
            )
        # Average pool and fully connected layer
        self.avg_pool = AdaptiveAvgPool3dOutSize1()
        self.projection = FullyConnected(residual_block_channel, num_classes, bias=True)
        self.act = supported_act_functions['relu']()

    def forward(self, x):
        x = self.s1(x)
        x = self.s2(x)
        x = self.avg_pool(x)
        # (N, C, T, H, W) -> (N, T, H, W, C).
        x = x.permute((0, 2, 3, 4, 1))
        x = self.projection(x)
        # Performs fully convolutional inference.
        if not self.training:
            x = self.act(x)
            x = x.mean([1, 2, 3])
        x = x.view(x.shape[0], -1)

        return x


##############################
# 3. We can instantiate MyNet and its efficient blocks will be in original form.
##############################
net_inst = MyNet()
print(net_inst)
#####################################################################################
#############################################
# 1.Train model
# Then we can train the model with your dataset/optimizer. Here we skip this training step, and just 
# leave the weight as initial value.
#############################################

# 1. Now the model is ready to deploy. First of all, let's convert the model into deploy form. In order to do 
# that, we need to use convert_to_deployable_form utility and provide an example input tensor to the 
# model. Note that once the model is converted into deploy form, the input size should be the same 
# as the example input tensor size during conversion.
#############################################
import torch
from pytorchvideo.accelerator.deployment.mobile_cpu.utils.model_conversion import (
    convert_to_deployable_form,
)
input_blob_size = (1, 3, 4, 6, 6)
input_tensor = torch.randn(input_blob_size)
net_inst_deploy = convert_to_deployable_form(net_inst, input_tensor)
print(net_inst_deploy)


################################
Let's check whether the network after conversion is arithmetically equivalent. We expect the output to 
be very close before/after conversion, with some small difference due to numeric noise from floating 
point operation.
###############################
net_inst.eval()
out_ref = net_inst(input_tensor)
out = net_inst_deploy(input_tensor)

max_err = float(torch.max(torch.abs(out_ref - out)))
print(f"max error is {max_err}")



#########################
Next we have two options: either deploy floating point model, or quantize model into int8 and then deploy.
Let's first assume we want to deploy floating point model. In this case, all we need to do is to export jit trace and then apply optimize_for_mobile for final optimization.
#########################
from torch.utils.mobile_optimizer import (
    optimize_for_mobile,
)
traced_model = torch.jit.trace(net_inst_deploy, input_tensor, strict=False)
traced_model_opt = optimize_for_mobile(traced_model)
# Here we can save the traced_model_opt to JIT file using traced_model_opt.save(<file_path>)




#########################
Alternatively, we may also want to deploy a quantized model. Efficient blocks are quantization-friendly by design - just wrap the model in deploy form with QuantStub/DeQuantStub and it is ready for Pytorch eager mode quantization.
#########################
# Wrapper class for adding QuantStub/DeQuantStub.
class quant_stub_wrapper(nn.Module):
    def __init__(self, module_in):
        super().__init__()
        self.quant = torch.quantization.QuantStub()
        self.model = module_in
        self.dequant = torch.quantization.DeQuantStub()
    def forward(self, x):
        x = self.quant(x)
        x = self.model(x)
        x = self.dequant(x)
        return x

net_inst_quant_stub_wrapper = quant_stub_wrapper(net_inst_deploy)
###################
# Preparation step of quantization. Fusion has been done for efficient blocks automatically during 
# convert_to_deployable_form, so we can just proceed to torch.quantization.prepare
###################
net_inst_quant_stub_wrapper.qconfig = torch.quantization.default_qconfig
net_inst_quant_stub_wrapper_prepared = torch.quantization.prepare(net_inst_quant_stub_wrapper)
####################
# Calibration and quantization. After preparation we will do calibration of quantization by feeding 
# calibration dataset (skipped here) and then do quantization.
####################
net_inst_quant_stub_wrapper_quantized = torch.quantization.convert(net_inst_quant_stub_wrapper_prepared)
#######################
# Then we can export trace of int8 model and deploy on mobile devices.
#######################
traced_model_int8 = torch.jit.trace(net_inst_quant_stub_wrapper_quantized, input_tensor, strict=False)
traced_model_int8_opt = optimize_for_mobile(traced_model_int8)
# Here we can save the traced_model_opt to JIT file using traced_model_int8_opt.save(<file_path>)

视频数据处理相关工具

一、现有视频处理工具支持的功能

1、视频预处理

（1）、视频解码

（2）、将视频数据转换为tensor（包含归一化）

2、深度学习

（1）、加载深度学习模型

（2）、训练

（3）、预测

（4）、一些加速手段

（4）、部署模型

二、pytorchvideo模块

1. data

2. accelerator

3. layers

4. model

5. transforms

(1). data augmentation

(2). 从视频中采样帧做分析

(3). 调整大小

(4). 放缩视频

(5). 随即裁剪

(6)mix up

(7)Converts a video from dtype uint8 to dtype float32.

6. accelerate

(1). 介绍

(2). 建立模型

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读