UCF 运动行为视频数据集

UCF Sports Action Data Set

运动识别动作分析动作识别人体动作识别计算机视觉

UCF体育数据集包括一组从各种体育活动中收集的动作，这些动作通常在广播电视频道（如BBC和ESPN）上播放。这些视频片段是从包括BBC动画画廊和GettyImages在内的大量库存视频网站上获得的。

下载所需积分：免积分下载

数据集发布者：数据集市

发布时间： 2020年09月28日

数据大小： 1.66GB

查看原始数据

数据介绍

UCF Sports dataset consists of a set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. The video sequences were obtained from a wide range of stock footage websites including BBC Motion gallery and GettyImages.

The dataset includes a total of 150 sequences with the resolution of 720 x 480. The collection represents a natural pool of actions featured in a wide range of scenes and viewpoints. By releasing the data set we hope to encourage further research into this class of action recognition in unconstrained environments. Since its introduction, the dataset has been used for numerous applications such as: action recognition, action localization, and saliency detection.

Dataset Actions

The dataset includes the following 10 actions. The figure above shows the a sample frame of all ten actions, along with their bounding box annotations of the humans shown in yellow.

Diving (14 videos) Golf Swing (18 videos) Kicking (20 videos) Lifting (6 videos) Riding Horse (12 videos) Running (13 videos) SkateBoarding (12 videos) Swing-Bench (20 videos) Swing-Side (13 videos) Walking (22 videos)

Dataset Summary

The following table summarizes the characteristics of the dataset.

Figure: Summary of the characteristics of UCF Sports.

Statistics

The following figure shows the distribution of the number of clips per action as the number of clips in each class is not the same.

Figure: Number of clips per action class.

The following figure illustrates the total duration of clips (blue) and the average clip length (green) for every action class. It is evident that certain actions are short in nature, such as kicking, as compared to walking or running, which are relatively longer and have more periodicity. However, it is apparent from the chart that the average duration of action clips shows great similarities across different classes. Therefore, merely considering the duration of one clip would not be enough for identifying the action.

Figure: The total time of video clips for each action class is shown in blue. Average length of clips for each action is shown in green.

Recommended Experimental Setup

Action Recognition

Leave-One-Out (LOO) cross-validation scheme: It is recommended as in [1] to test on UCF Sports by using a Leave-One-Out (LOO) cross-validation scheme. This scenario takes out one sample video for testing and trains using all of the remaining videos of an action class. This is performed for every sample video in a cyclic manner, and the overall accuracy is obtained by averaging the accuracy of all iterations.
Action Localization

Train/Test Splits: It is recommended to use train/test splits as suggested in [*]. The proposed experimental setup splits the dataset into two uneven parts: two-third of videos for training and one-third for testing. To calculate the accuracy, an intersection-over-union criterion is used to plot ROC curves with a certain overlap threshold. The intersection-over-union computes the overlap between the predicted bounding box and the ground truth, and divides it by the union of both the bounding boxes, for every frame. This value is then averaged over all frames in a video. A 20 % overlap threshold is used for this experiment. Area Under Curve (AUC) against the overlap threshold, which shows how the performance varies if the threshold is changed, is used to compute the final performance. To calculate the overlap, the ground truth bounding box per frame is provided for the dataset.

[*] Tian Lan, Yang Wang and Greg Mori, Discriminative figure-centric models for joint action localization and recognition, IEEE International Conference on Computer Vision (ICCV), 2011.

Download

The data set can be downloaded by clicking here.

Human gaze annotations can be downloaded by clicking here.

Train/Test splits for Action localization can be downloaded by clicking here.

If you use this data set, please cite the following papers:

[1] Mikel D. Rodriguez, Javed Ahmed, and Mubarak Shah, Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition, Computer Vision and Pattern Recognition, 2008.
[2] Khurram Soomro and Amir R. Zamir, Action Recognition in Realistic Sports Videos, Computer Vision in Sports. Springer International Publishing, 2014.

还没有任何文件记录.

UCF 运动行为视频数据集

UCF Sports Action Data Set

相关数据

MS-微软语音语料库（印度语）

VoxForge 语音库

说话人深度识别数据集（VoxCeleb2）