POLIMI-ITW-S: A shopping mall dataset in-the-wild

This page introduces “POLIMI-ITW-S” dataset
“POLIMI-ITW-S” contains 37 action classes and 22,164 video samples with total . The average duration of each clip is about 7 seconds.
The dataset contains RGB videos, 2-D skeletal data, bounding boxes and labels for each sample.
This dataset was taken from RGB cameras of two smartphones by two recorders with resolution 1920×1080 pixels, 30 fps, held by hands about 90 cm from the floor.
The 2-D skeletal data and person bounding boxes are generated by the OpenPifPaf.
The 2-D skeletal data contains the 2-D coordinates of 17 body joints at each frame.
The recorders imitated the mobile robot, keeping moving or staying till by looking around to capture the persons who are performing the actions. We did not mount the camera on a robot in order to avoid uncommon situations that the presence of a robot could trigger.

1. Action Classes

As shown in the tables below, the actions in the dataset are distributed on three levels:
General Level: labels are used for single action.
Modifier Level: labels are used for actions of multiple persons.
Aggregate Level detailed labels aim at describing multiple actions in a single label.

1.1 General Level Actions (10)

A1: cleaningA2: crouchingA3: jumpingA4: laying
A5: ridingA6: runningA7: scooterA8: sitting
A16: standingA27: walking

1.2 Modifier Level Actions (3)

A8: sittingTogetherA17: standingTogetherA27: walkingTogether

1.3 Aggregate Level Actions (24)

A10: sittingWhileCallingA11: sittingWhileDrinkingA12: sittingWhileEatingA13: sittingWhileHoldingBabyInArms
A14: sittingWhileTalkingTogetherA15: sittingWhileWatchingPhoneA18: standingWhileCallingA19: standingWhileDrinking
A20: standingWhileEatingA21: standingWhileHoldingBabyInArmsA22: standingWhileHoldingCartA23: standingWhileHoldingStroller
A24: standingWhileLookingAtShopsA25: standingWhileTalkingTogetherA26: standingWhileWatchingPhoneA29: walkingWhileCalling
A30: walkingWhileDrinkingA31: walkingWhileEatingA32: walkingWhileHoldingBabyInArmsA33: walkingWhileHoldingCart
A34: walkingWhileHoldingStrollerA35: walkingWhileLookingAtShopsA36: walkingWhileTalkingTogetherA37: walkingWhileWatchingPhone

2. Size of Datasets

The dataset includes three types of files:

  • RGB videos: collected RGB videos.
  • 2-D skeletons + bounding boxes + labels: JSON format files including 2-D skeletons, bounding boxes and labels for each RGB video.
  • pre-processed data: splitted into “training” (70%) and “test” (30%) data with format “.npy” for joint body data and “.pkl” for label data.

The size of each type is shown in the below table:

Data TypePOLIMI-ITW-S
RGB videos335 GB
2-D skeletons + bounding boxes + labels39.4 GB
pre-processed data (.npy and .pkl)17.7 GB
Total392.1 GB

3. More Information (FAQs and Sample Codes)

We have developed the annotation tool which should be used to visualize poses, bounding boxes and labels on the video clips.

We provide the developed annotation tool, data pre-processing script, information about the data, answers to FAQs, samples codes to read the data, and the latest published results on our datasets here.

4. Samples

Some normalized images from the dataset
RGB Clip A
RGB Clip B
Classified clip 1Classified clip 2
Links to some sample videos

5. Terms & Conditions of Use

The datasets are released for academic research only, and are free to researchers from educational or research institutes for non-commercial purposes. The use of the dataset is governed by the following terms and conditions:

  • Without the expressed permission of the AIRLab, any of the following will be considered illegal: redistribution, derivation or generation of a new dataset from this dataset, and commercial usage of any of these datasets in any way or form, either partially or in its entirety.
  • For the sake of privacy, images of all subjects in any of these datasets are only allowed for the demonstration in academic publications and presentations.
  • All users of “POLIMI-ITW-S” dataset agree to indemnify, defend and hold harmless, the AIRLab and its officers, employees, and agents, individually and collectively, from any and all losses, expenses, and damages.

6. How to Download Datasets

Creative Commons License

This dataset is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Users can submit the Request Form and accept the Release Agreement. We will validate your request and give you access to the dataset.

Robotower

Robotower is a game where an omnidirectional robot and a human player confront in a playground 4×4 m, having four towers on the corners. Aim of the robot is to knock down the towers. Aim of the human player is to defend the towers for at least 3′ by intercepting the robot’s path. The human player can also push a button on the tower, this will lit a LED every 2.5″; when all four LEDs are lit, the human player has conquered the tower, and the robot cannot aim at it any longer. Of course, while the player pushes a button the robot may aim at another tower.

The system can estimate the the player’s skill on line and modify the capabilities of the robot to obtain an even game, in accordance to the theory of flow. There is also the possibility to trigger deceptive actions, which reinforce even more the sensation to play against a rational agent.

Presentation of the game.