slowfast mmaction2 ava2.1数据集制作训练

作者: dillqq | 来源:发表于2023-06-08 10:39 被阅读0次

洛杉矶房价预测
yolo3训练自己的数据
TensorFlow2.X结合OpenCV = 手势识别
深度学习验证码识别---验证码图片生成
CS231N学习记录
Yolov3：训练自己的模型 Part2：训练
TensorFlow（二）制做自己的数据集
caffe数据集格式转换—图像格式到LMDB/LEVELDB
神经网络优化1
2.封装kNN算法之数据分割

参考这个 slowfast博客改的，因为这个博客只针对单个30秒视频，并不是把一个视频裁剪为多个30秒视频，并且在windows上运行会报错，本文主要是针对遇到的问题对代码进行修改。具体如下。
首先文件摆放位置：

文件摆放

首先准备视频将视频放在./ava/videos里面，将其裁剪为30秒一个视频存放到./ava/videos_cut，以数字命名。代码如下，不再使用ffmpeg命令，而是使用ffmpeg包，使用pip install ffmpeg-python安装。

文件video2img.py

import os
import shutil
from tqdm import tqdm

start = 0
seconds = 30

video_path = './ava/videos'
labelframes_path = './ava/labelframes'
rawframes_path = './ava/rawframes'
cut_videos_sh_path = './cut_videos.sh'



fps = 30
raw_frames = seconds * fps

with open(cut_videos_sh_path, 'r') as f:
    sh = f.read()
sh = sh.replace(sh[sh.find('    ffmpeg'):],
                f'    ffmpeg -ss {start} -t {seconds} -i "${{video}}" -r 30 -strict experimental "${{out_name}}"\n  fi\ndone\n')
with open(cut_videos_sh_path, 'w') as f:
    f.write(sh)

os.makedirs(labelframes_path, exist_ok=True)
video_ids = [video_id[:-4] for video_id in os.listdir(video_path)]
for video_id in tqdm(video_ids):
    for img_id in range(2 * fps + 1, (seconds - 2) * 30, fps):
        shutil.copyfile(os.path.join(rawframes_path, video_id, '1_' + format(img_id, '05d') + '.jpg'),
                        os.path.join(labelframes_path, video_id + '_' + format(start + img_id // 30, '05d') + '.jpg'))

执行这个后，你的videos_cut文件夹下面就会生成多个30秒的视频

对裁剪视频进行抽帧

代码如下：
extract_rgb_frames_ffmpeg.sh

IN_DATA_DIR="./ava/videos_cut"
OUT_DATA_DIR="./ava/rawframes"
if [[ ! -d "${OUT_DATA_DIR}" ]]; then
  echo "${OUT_DATA_DIR} doesn't exist. Creating it.";
  mkdir -p ${OUT_DATA_DIR}
fi
for video in $(ls -A1 -U ${IN_DATA_DIR}/*)
do
  video_name=${video##*/}
  if [[ $video_name = *".webm" ]]; then
    video_name=${video_name::-5}
  else
    video_name=${video_name::-4}
  fi
#  out_video_dir=${OUT_DATA_DIR}/${video_name}
  out_video_dir=${OUT_DATA_DIR}/${video_name}
#  echo $out_video_dir
#  echo $video_name
  mkdir -p "${out_video_dir}"
#  out_name="${out_video_dir}/${out_video_dir}_%05d.jpg"
  out_name="${out_video_dir}/${video_name}_%05d.jpg"
#  echo $out_name
  ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}"
done

执行之后rawframes下面应该是每个视频的抽帧，如下所示。

image.png

因为在slow fast 中是1秒抽30帧图片，目的是用来训练，据说因为slowfast在slow通道里1秒会采集到15帧，在fast通道里1秒会采集到2帧。所以我们的打标文件要按照这个来。

raw2label.py代码如下：

import os
import shutil
from tqdm import tqdm
fps = 30
seconds = 30
start =0
video_path = './ava/videos_cut'
labelframes_path = './ava/labelframes'
rawframes_path = './ava/rawframes'
for video_id in os.listdir(video_path):
    print(video_id)
video_ids = [video_id[:-4] for video_id in os.listdir(video_path)]
for video_id in tqdm(video_ids):
    print(video_id)
    # 从61帧到840帧，间隔30帧取一次
    for img_id in range(2 * fps + 1, (seconds - 2) * 30, fps):
        shutil.copyfile(os.path.join(rawframes_path, video_id, video_id+'_' + format(img_id, '05d') + '.jpg'),
                        os.path.join(labelframes_path, video_id + '_' + format(start + img_id // 30, '05d') + '.jpg'))

这样labelframes中文件夹就有图片。
然后via标注工具下载地址，对labelframes文件进行标注。将标注文件保存后，就可以执行最后一步，将via转换为ava数据集。

via转换为ava数据集

via2ava.py代码如下：

"""
Theme:ava format data transformer
author:Hongbo Jiang
time:2022/3/14/1:51:51
description:

    这是一个数据格式转换器，根据mmaction2的ava数据格式转换规则将来自网站:
    https://www.robots.ox.ac.uk/~vgg/software/via/app/via_video_annotator.html
    的、标注好的、视频理解类型的csv文件转换为mmaction2指定的数据格式。
    转换规则：
        # AVA Annotation Explained
        In this section, we explain the annotation format of AVA in details:
        ```
        mmaction2
        ├── data
        │   ├── ava
        │   │   ├── annotations
        │   │   |   ├── ava_dense_proposals_train.FAIR.recall_93.9.pkl
        │   │   |   ├── ava_dense_proposals_val.FAIR.recall_93.9.pkl
        │   │   |   ├── ava_dense_proposals_test.FAIR.recall_93.9.pkl
        │   │   |   ├── ava_train_v2.1.csv
        │   │   |   ├── ava_val_v2.1.csv
        │   │   |   ├── ava_train_excluded_timestamps_v2.1.csv
        │   │   |   ├── ava_val_excluded_timestamps_v2.1.csv
        │   │   |   ├── ava_action_list_v2.1.pbtxt
        ```
        ## The proposals generated by human detectors
        In the annotation folder, `ava_dense_proposals_[train/val/test].FAIR.recall_93.9.pkl` are human proposals generated by a human detector. They are used in training, validation and testing respectively. Take `ava_dense_proposals_train.FAIR.recall_93.9.pkl` as an example. It is a dictionary of size 203626. The key consists of the `videoID` and the `timestamp`. For example, the key `-5KQ66BBWC4,0902` means the values are the detection results for the frame at the $$902_{nd}$$ second in the video `-5KQ66BBWC4`. The values in the dictionary are numpy arrays with shape $$N \times 5$$ , $$N$$ is the number of detected human bounding boxes in the corresponding frame. The format of bounding box is $$[x_1, y_1, x_2, y_2, score], 0 \le x_1, y_1, x_2, w_2, score \le 1$$. $$(x_1, y_1)$$ indicates the top-left corner of the bounding box, $$(x_2, y_2)$$ indicates the bottom-right corner of the bounding box; $$(0, 0)$$ indicates the top-left corner of the image, while $$(1, 1)$$ indicates the bottom-right corner of the image.
        ## The ground-truth labels for spatio-temporal action detection
        In the annotation folder, `ava_[train/val]_v[2.1/2.2].csv` are ground-truth labels for spatio-temporal action detection, which are used during training & validation. Take `ava_train_v2.1.csv` as an example, it is a csv file with 837318 lines, each line is the annotation for a human instance in one frame. For example, the first line in `ava_train_v2.1.csv` is `'-5KQ66BBWC4,0902,0.077,0.151,0.283,0.811,80,1'`: the first two items `-5KQ66BBWC4` and `0902` indicate that it corresponds to the $$902_{nd}$$ second in the video `-5KQ66BBWC4`. The next four items ($$[0.077(x_1), 0.151(y_1), 0.283(x_2), 0.811(y_2)]$$) indicates the location of the bounding box, the bbox format is the same as human proposals. The next item `80` is the action label. The last item `1` is the ID of this bounding box.
        ## Excluded timestamps
        `ava_[train/val]_excludes_timestamps_v[2.1/2.2].csv` contains excluded timestamps which are not used during training or validation. The format is `video_id, second_idx` .
        ## Label map
        `ava_action_list_v[2.1/2.2]_for_activitynet_[2018/2019].pbtxt` contains the label map of the AVA dataset, which maps the action name to the label index.
"""

import csv
import os
from distutils.log import info
import pickle
from matplotlib.pyplot import contour, show
import numpy as np
import cv2
from sklearn.utils import shuffle


def transformer(origin_csv_path, frame_image_dir,
                train_output_pkl_path, train_output_csv_path,
                valid_output_pkl_path, valid_output_csv_path,
                exclude_train_output_csv_path, exclude_valid_output_csv_path,
                out_action_list, out_labelmap_path, dataset_percent=0.9):
    """
    输入：
    origin_csv_path:从网站导出的csv文件路径。
    frame_image_dir:以"视频名_第n秒.jpg"格式命名的图片，这些图片是通过逐秒读取的。
    output_pkl_path:输出pkl文件路径
    output_csv_path:输出csv文件路径
    out_labelmap_path:输出labelmap.txt文件路径
    dataset_percent:训练集和测试集分割

    输出:无

    """

    # -----------------------------------------------------------------------------------------------
    get_label_map(origin_csv_path, out_action_list, out_labelmap_path)
    # -----------------------------------------------------------------------------------------------
    information_array = [[], [], []]
    # 读取输入csv文件的位置信息段落
    with open(origin_csv_path, 'r') as csvfile:
        count = 0
        content = csv.reader(csvfile)
        for line in content:
            # print(line)
            if count >= 10:
                frame_image_name = eval(line[1])[0]  # str
                # print(line[-2])
                location_info = eval(line[4])[1:]  # list
                action_list = list(eval(line[5]).values())[0].split(',')
                action_list = [int(x) for x in action_list]  # list
                information_array[0].append(frame_image_name)
                information_array[1].append(location_info)
                information_array[2].append(action_list)
            count += 1
    # 将：对应帧图片名字、物体位置信息、动作种类信息汇总为一个信息数组
    information_array = np.array(information_array, dtype=object).transpose()
    # information_array = np.array(information_array)
    # -----------------------------------------------------------------------------------------------
    num_train = int(dataset_percent * len(information_array))
    train_info_array = information_array[:num_train]
    valid_info_array = information_array[num_train:]
    get_pkl_csv(train_info_array, train_output_pkl_path, train_output_csv_path, exclude_train_output_csv_path,
                frame_image_dir)
    get_pkl_csv(valid_info_array, valid_output_pkl_path, valid_output_csv_path, exclude_valid_output_csv_path,
                frame_image_dir)


def get_label_map(origin_csv_path, out_action_list, out_labelmap_path):
    classes_list = 0
    classes_content = ""
    labelmap_strings = ""
    # 提取出csv中的第9行的行为下标
    with open(origin_csv_path, 'r') as csvfile:
        count = 0
        content = csv.reader(csvfile)
        for line in content:
            if count == 8:
                classes_list = line
                break
            count += 1
    # 截取种类字典段落
    st = 0
    ed = 0
    for i in range(len(classes_list)):
        if classes_list[i].startswith('options'):
            st = i
        if classes_list[i].startswith('default_option_id'):
            ed = i
    for i in range(st, ed):
        if i == st:
            classes_content = classes_content + classes_list[i][len('options:'):] + ','
        else:
            classes_content = classes_content + classes_list[i] + ','
    classes_dict = eval(classes_content)[0]
    # 写入labelmap.txt文件
    with open(out_action_list, 'w') as f:  # 写入action_list文件
        for v, k in classes_dict.items():
            labelmap_strings = labelmap_strings + "label {{\n  name: \"{}\"\n  label_id: {}\n  label_type: PERSON_MOVEMENT\n}}\n".format(
                k, int(v) + 1)
        f.write(labelmap_strings)
    labelmap_strings = ""
    with open(out_labelmap_path, 'w') as f:  # 写入label_map文件
        for v, k in classes_dict.items():
            labelmap_strings = labelmap_strings + "{}: {}\n".format(int(v) + 1, k)
        f.write(labelmap_strings)


def get_pkl_csv(information_array, output_pkl_path, output_csv_path, exclude_output_csv_path, frame_image_dir):
    # 在遍历之前需要对我们的字典进行初始化
    pkl_data = dict()  # 存储pkl键值对信的字典(其值为普通list)
    csv_data = []  # 存储导出csv文件的2d数组
    read_data = {}  # 存储pkl键值对的字典(方便字典的值化为numpy数组)

    for i in range(len(information_array)):
        img_name = information_array[i][0]
        # -------------------------------------------------------------------------------------------
        video_name, frame_name = '_'.join(img_name.split('_')[:-1]), format(int(img_name.split('_')[-1][:-4]),
                                                                            '04d')  # 我的格式是"视频名称_帧名称"，格式不同可自行更改
        # -------------------------------------------------------------------------------------------
        pkl_key = video_name + ',' + frame_name
        pkl_data[pkl_key] = []
    # 遍历所有的图片进行信息读取并写入pkl数据
    for i in range(len(information_array)):
        img_name = information_array[i][0]
        # -------------------------------------------------------------------------------------------
        video_name, frame_name = '_'.join(img_name.split('_')[:-1]), str(
            int(img_name.split('_')[-1][:-4]))  # 我的格式是"视频名称_帧名称"，格式不同可自行更改
        # -------------------------------------------------------------------------------------------
        imgpath = frame_image_dir + '/' + img_name
        location_list = information_array[i][1]
        action_info = information_array[i][2]
        image_array = cv2.imread(imgpath)
        h, w = image_array.shape[:2]
        # 进行归一化
        location_list[0] /= w
        location_list[1] /= h
        location_list[2] /= w
        location_list[3] /= h
        location_list[2] = location_list[2] + location_list[0]
        location_list[3] = location_list[3] + location_list[1]
        # 置信度置为1
        # 组装pkl数据

        for kind_idx in action_info:
            csv_info = [video_name, frame_name, *location_list, kind_idx + 1, 1]
            csv_data.append(csv_info)

        location_list = location_list + [1]
        pkl_key = video_name + ',' + format(int(frame_name), '04d')
        pkl_value = location_list
        pkl_data[pkl_key].append(pkl_value)

    for k, v in pkl_data.items():
        read_data[k] = np.array(v)

    with open(output_pkl_path, 'wb') as f:  # 写入pkl文件
        pickle.dump(read_data, f)

    with open(output_csv_path, 'w', newline='') as f:  # 写入csv文件, 设定参数newline=''可以不换行。
        f_csv = csv.writer(f)
        f_csv.writerows(csv_data)

    with open(exclude_output_csv_path, 'w', newline='') as f:  # 写入csv文件, 设定参数newline=''可以不换行。
        f_csv = csv.writer(f)
        f_csv.writerows([])


def showpkl(pkl_path):
    with open(pkl_path, 'rb') as f:
        content = pickle.load(f)
    return content


def showcsv(csv_path):
    output = []
    with open(csv_path, 'r') as f:
        content = csv.reader(f)
        for line in content:
            output.append(line)
    return output


def showlabelmap(labelmap_path):
    classes_dict = dict()
    with open(labelmap_path, 'r') as f:
        content = (f.read().split('\n'))[:-1]
        for item in content:
            mid_idx = -1
            for i in range(len(item)):
                if item[i] == ":":
                    mid_idx = i
            classes_dict[item[:mid_idx]] = item[mid_idx + 1:]
    return classes_dict


os.makedirs('./ava/annotations', exist_ok=True)
transformer("./Unnamed-VIA Project13Jul2022_16h01m30s_export.csv", './ava/labelframes',
            './ava/annotations/ava_dense_proposals_train.FAIR.recall_93.9.pkl', './ava/annotations/ava_train_v2.1.csv',
            './ava/annotations/ava_dense_proposals_val.FAIR.recall_93.9.pkl', './ava/annotations/ava_val_v2.1.csv',
            './ava/annotations/ava_train_excluded_timestamps_v2.1.csv',
            './ava/annotations/ava_val_excluded_timestamps_v2.1.csv',
            './ava/annotations/ava_action_list_v2.1.pbtxt', './ava/annotations/labelmap.txt', 0.9)
print(showpkl('./ava/annotations/ava_dense_proposals_train.FAIR.recall_93.9.pkl'))
print(showcsv('././ava/annotations/ava_train_v2.1.csv'))
print(showlabelmap('././ava/annotations/labelmap.txt'))

这样在annotations中就有ava2.1的数据样本了。

洛杉矶房价预测
制作训练集、评测集交叉验证数据有限，发挥数据本来的效率数据的训练集合评测集的矛盾a. 如果用更多的数据去训练...
yolo3训练自己的数据
训练自己的数据主要分为以下几个步骤： A.数据集制作１.制作ＶＯＣ格式的xml文件工具：labelImg ...
TensorFlow2.X结合OpenCV = 手势识别
先显示下部分数据集图片构建模型进行训练数据集地址（可参考如何用TensorFlow2.0制作自己的数据集） i...
深度学习验证码识别---验证码图片生成
在做验证码识别之前，需要做数据准备，即验证码图片，作为后续模型训练与验证的训练数据集和测试数据集。验证码图片在制作...
CS231N学习记录
数据集：训练集+验证集+测试集交叉验证：当训练数据太小时，为了更好地利用数据，那么将训练数据集划分成n份，其中n...
Yolov3：训练自己的模型 Part2：训练
上一篇文章介绍了数据集的制作，本篇文章将介绍yolo训练的流程及参数设置。划分训练集在ImageSets/Ma...
TensorFlow（二）制做自己的数据集
1.数据集格式 2. 数据集制作代码参考资料 [1] TensorFlow 制作自己的TFRecord数据集读...
caffe数据集格式转换—图像格式到LMDB/LEVELDB
使用caffe的图像分类模型来训练自己的数据集时，数据集如何制作是一个问题。我们通常收集到的是图像数据(如.jp...
神经网络优化1
数据划分数据集分类通常会将数据集分层三类：训练集（Training Sets）：采用训练集进行训练时，通过改...
2.封装kNN算法之数据分割
训练数据集与测试数据集当我们拿到一组数据之后，通常我们需要把数据分割成两部分，即训练数据集和测试数据集。训练数据...