美文网首页
Person reID数据集介绍

Person reID数据集介绍

作者: 北小卡 | 来源:发表于2018-07-05 13:59 被阅读0次
海贼王
数据集 #camera train #identities train #imgs test #identities query #imgs gallery #imgs
market 6 751 12,936 750 3,368 19,732
duke 8 702 16,522 702 2,228 17,661(702+408 distractor)
cuhk03-np(detected) 2 767 7,365 700 1,400 5,332
msmt17 15 1,041 32,621 3,060 11,659 82,161

The Market-1501 dataset is annotated using the following rules. For each detected bounding box to be annotated, we manually draw a ground truth bounding box that contains the pedestrian. Then, for the detected and hand-drawn bounding boxes, we calculate the ratio of the overlapping area to the union area. If the ratio is larger than 50%, the DPM bounding box is marked as "good"; if the ratio is smaller than 20%, the bounding boxe is marked as "distractor"; otherwise, it is marked as "junk", meaning that this image is of zero influence to the re-identification accuracy.

Naming Rule of the bboxes
In bbox "0001_c1s1_001051_00.jpg", "c1" is the first camera (there are totally 6 cameras).

"s1" is sequence 1 of camera 1. Here, a sequence was defined automatically by the camera. We suppose that the camera cannot store a whole video that is quite large, so it splits the video into equally large sequences. Two sequences, namely, "c1s1" and "c2s1" do not happen exactly at the same time. This is mainly because the starting time of the 6 cameras are not exactly the same (it takes time to turn on them). But, "c1s1" and "c2s1" are roughly at the same time period.

"001051" is the 1051th frame in the sequence "c1s1". The frame rate is 25 frames per sec.

As with the last two digits, remember we use the DPM detector. Then, for identity "0001", there may be multiple detected bounding boxes in the frame "c1s1_001051". In other words, a pedestrian in the image may have several bboxes by DPM. So,"00"means that this bounding box is the first one among the several.

The package contains four folders.

  1. "bounding_box_test". There are 19,732 images in this folder used for testing.
  2. "bounding_box_train". There are 12,936 images in this folder used for training.
  3. "query". There are 750 identities. We randomly select one query image for each camera. So the maximum number of query images is 6 for an identity. In total, there are 3,368 query images in this folder.
  4. "gt_query". This folder contains the ground truth annotations. For each query, the relevant images are marked as "good" or "junk". "junk" has zero impact on search accuracy. "junk" images also include those in the same camera with the query.
  5. "gt_bbox". We also provide the hand-drawn bounding boxes. They are used to judge whether a DPM bounding box is good.

We have released the 500k bboxes as distractors. Market-1501+500k Dataset


The original dataset contains 85-minute high-resolution videos from 8 different cameras. Hand-drawn pedestrain bounding boxes are available.
We crop pedestrain images from the videos every 120 frames, yielding in total 36,411 bounding boxes with IDs. There are 1,404 identities appearing in more than two cameras and 408 identities (distractor ID) who appear in only one camera. We randomly select 702 IDs as the training set and the remaining 702 IDs as the testing set. In the testing set, we pick one query image for each ID in each camera and put the remaining images in the gallery.
As a result, we get 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images (702 ID + 408 distractor ID).
Related Datasets

Naming Rule of the images In bbox "0005_c2_f0046985.jpg", "0005" is the identity. "c2" means the image from Camera 2. "f0046985" is the 46985th frame in the video of Camera 2.

  • CUHK01-CUHK03 (Year:2013`2014)
    CUHK01 identities少,图片也少。命名例子:0006004.png,即第六个人的第四张图片。
    CUHK02 :

The five pairs of camera views are P1-P5. Cam1 and Cam2 just indicates two different cameras rather than being taken as a unique camera id.
P1: 971 identities
P2: 306 identities
P3: 107 identities
P4: 193 identities
P5: 239 identities
The first three digits of the image names are to match the identities between the two cameras in each settings.

 CUHK03 :

The data is stored in MATLAB MAT file "cuhk-03.mat". 1467 identities are
collected from 5 different pairs of camera views. The "cuhk-03.mat" contains
three cells.
"detected" means the bounding boxes are estimated by pedestrian detector
"labeled" means the bounding boxes are labeled by human
"testsets" contains the testing protocols

CUHK-NP

Labeled detected
#Training 7,368 7,365
#Query 1,400 1,400
#Gallery 5,328 5,332

To collect a large-scale person re-identification dataset-MSMT17, we utilize an 15-camera network deployed in campus. This camera network contains 12 outdoor cameras and 3 indoor cameras. We select 4 days with different weather conditions in a month for video collection. For each day, 3 hours of videos taken in the morning, noon, and afternoon, respectively, are selected for pedestrian detection and annotation. Our final raw video set contains 180 hours of videos, 12 outdoor cameras, 3 indoor cameras, and 12 time slots. Faster RCNN is utilized for pedestrian bounding box detection. Three labelers go through the detected bounding boxes and annotate ID label for 2 months. Finally, 126,441 bounding boxes of 4,101 identities are annotated. Some statistics on MSMT17 are shown in above. Compared with existing datasets, we summarize the new features in MSMT17 into the following aspects:
(1) Larger number of identities, bounding boxes, and cameras.
(2) Complex scenes and backgrounds.
(3) Multiple time slots result in severe lighting changes.
(4) More reliable bounding box detector.


The dataset consists of images extracted from multiple person trajectories recorded from two different, static surveillance cameras. Images from these cameras contain a viewpoint change and a stark difference in illumination, background and camera characteristics. Since images are extracted from trajectories, several different poses per person are available in each camera view. We have recorded 475 person trajectories from one view and 856 from the other one, with 245 persons appearing in both views. We have filtered out some heavily occluded persons, persons with less than five reliable images in each camera view, as well as corrupted images induced by tracking and annotation errors. This results in the following setup.

Camera view A shows 385 persons, camera view B shows 749 persons. The first 200 persons appear in both camera views, i.e., person 0001 of view A corresponds to person 0001 of view B, person 0002 of view A corresponds to person 0002 of view B, and so on. The remaining persons in each camera view (i.e., person 0201 to 0385 in view A and person 0201 to 0749 in view B) complete the gallery set of the corresponding view. Hence, a typical evaluation consists of searching the 200 first persons of one camera view in all persons of the other view. This means that there are two possible evalutaion procedures, either the probe set is drawn from view A and the gallery set is drawn from view B (A to B, used in our paper), or vice versa (B to A).


For more about Person Re-id datasets, please refer to Person Re-identification Datasets

相关文章

网友评论

      本文标题:Person reID数据集介绍

      本文链接:https://www.haomeiwen.com/subject/tczjuftx.html