POLIMI-ITW-S: A shopping mall dataset in-the-wild

This page introduces “POLIMI-ITW-S” dataset
“POLIMI-ITW-S” contains 37 action classes and 22,164 video samples with total . The average duration of each clip is about 7 seconds.
The dataset contains RGB videos, 2-D skeletal data, bounding boxes and labels for each sample.
This dataset was taken from RGB cameras of two smartphones by two recorders with resolution 1920×1080 pixels, 30 fps, held by hands about 90 cm from the floor.
The 2-D skeletal data and person bounding boxes are generated by the OpenPifPaf.
The 2-D skeletal data contains the 2-D coordinates of 17 body joints at each frame.
The recorders imitated the mobile robot, keeping moving or staying till by looking around to capture the persons who are performing the actions. We did not mount the camera on a robot in order to avoid uncommon situations that the presence of a robot could trigger.

1. Action Classes

As shown in the tables below, the actions in the dataset are distributed on three levels:
General Level: labels are used for single action.
Modifier Level: labels are used for actions of multiple persons.
Aggregate Level detailed labels aim at describing multiple actions in a single label.

1.1 General Level Actions (10)

A1: cleaningA2: crouchingA3: jumpingA4: laying
A5: ridingA6: runningA7: scooterA8: sitting
A16: standingA27: walking

1.2 Modifier Level Actions (3)

A8: sittingTogetherA17: standingTogetherA27: walkingTogether

1.3 Aggregate Level Actions (24)

A10: sittingWhileCallingA11: sittingWhileDrinkingA12: sittingWhileEatingA13: sittingWhileHoldingBabyInArms
A14: sittingWhileTalkingTogetherA15: sittingWhileWatchingPhoneA18: standingWhileCallingA19: standingWhileDrinking
A20: standingWhileEatingA21: standingWhileHoldingBabyInArmsA22: standingWhileHoldingCartA23: standingWhileHoldingStroller
A24: standingWhileLookingAtShopsA25: standingWhileTalkingTogetherA26: standingWhileWatchingPhoneA29: walkingWhileCalling
A30: walkingWhileDrinkingA31: walkingWhileEatingA32: walkingWhileHoldingBabyInArmsA33: walkingWhileHoldingCart
A34: walkingWhileHoldingStrollerA35: walkingWhileLookingAtShopsA36: walkingWhileTalkingTogetherA37: walkingWhileWatchingPhone

2. Size of Datasets

The dataset includes three types of files:

  • RGB videos: collected RGB videos.
  • 2-D skeletons + bounding boxes + labels: JSON format files including 2-D skeletons, bounding boxes and labels for each RGB video.
  • pre-processed data: splitted into “training” (70%) and “test” (30%) data with format “.npy” for joint body data and “.pkl” for label data.

The size of each type is shown in the below table:

RGB videos335 GB
2-D skeletons + bounding boxes + labels39.4 GB
pre-processed data (.npy and .pkl)17.7 GB
Total392.1 GB

3. More Information (FAQs and Sample Codes)

We have developed the annotation tool which should be used to visualize poses, bounding boxes and labels on the video clips.

We provide the developed annotation tool, data pre-processing script, information about the data, answers to FAQs, samples codes to read the data, and the latest published results on our datasets here.

4. Samples

Some normalized images from the dataset
RGB Clip A
RGB Clip B
Classified clip 1Classified clip 2
Links to some sample videos

5. Terms & Conditions of Use

The datasets are released for academic research only, and are free to researchers from educational or research institutes for non-commercial purposes. The use of the dataset is governed by the following terms and conditions:

  • Without the expressed permission of the AIRLab, any of the following will be considered illegal: redistribution, derivation or generation of a new dataset from this dataset, and commercial usage of any of these datasets in any way or form, either partially or in its entirety.
  • For the sake of privacy, images of all subjects in any of these datasets are only allowed for the demonstration in academic publications and presentations.
  • All users of “POLIMI-ITW-S” dataset agree to indemnify, defend and hold harmless, the AIRLab and its officers, employees, and agents, individually and collectively, from any and all losses, expenses, and damages.

6. How to Download Datasets

Creative Commons License

This dataset is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Users can submit the Request Form and accept the Release Agreement. We will validate your request and give you access to the dataset.

Interactive Objects

Everyday objects can be animated to improve their functionalities and to provide a more interesting environment. Moreover, it is interesting to explore new interaction situations. Proper design of shape and interaction is needed to obtain interesting objects. Emotional expression is an interesting aspect to explore.

We have developed a couple of emotional trash bins, going around to invite to trash selected materials by using the lid movements and sounds, a coat-hanger (IGHOR), welcoming people entering and asking to have their coats, showing sadness if they keep it on, a naughty fan, coming close and suddenly investing the person with an air flow, a naughty money saver that has to be chased to give it money, a kind of pillow that react with sounds to the way it is touched.

Robotic Art

Robots can be used as artistic media, able to perform and interact with people in artistic representations.

Robot actor

We have developed the first steps that will bring to the development of an autonomous robotic actor, able to participate to a public representation either with a defined role and script, or, as a final goal, as an Improv actor able to adapt the performance to external stimuli. For the moment we have a developed a robot that is able to move in classical scenes (e.g. the balcony scene of Romeo and Juliet) selecting the proper emotional expressions for the situation, and a framework to define emotional expressions according to the social setting among characters, and the situation. The final step to obtain a robotic actor to play scripted scenes is under development.

Interactive robotic art

Robots can have different shapes and play different roles in interactive artistic performances. We are exploiting materials like nets, polyethilene sheets, polyurethane foams and other materials to obtain shapes interesting to move in interactive exhibits. Emotional expression is also in this area, an interesting feature to explore.


METRICS (Metrological Evaluation and Testing of Robots in International CompetitionS) organises challenge-led and industry-relevant competitions in the four Priority Areas (PAs) identified by ICT-09-2019-2020: Healthcare, Infrastructure Inspection and Maintenance (I&M), Agri-Food, and Agile Production.

Within METRICS, AIRLab is in charge of the ACRE (Agri-food Competition for Robot Evaluation) competition, dedicated to benchmarking agricultural robots.


ALMA (Ageing Without Losing Mobility and Autonomy) is a European project focused on supporting the autonomous mobility, navigation, and orientation of the mobility-impaired person (elderly and/or temporarily or permanently disabled person).

The ALMA system is a modular combination of advanced hardware and software technologies into an integrated and modular cost-effective system. AIRLab contributed to ALMA with its Personal Mobility Kit.

Contact: Matteo Matteucci

For additional details: http://www.alma-aal.org/

Jedi Trainer

In Jedi Trainer a drone is flying around a Jedi knight trainee aiming at training its ability in using the light saber, as in the first episode of the Star wars saga with Luke Skywalker.

By analyzing the image from the onboard camera, the drone maintains the distance from the player, keeping it always in the image, and moving slightly left and right as to look for the best position to shot. When appropriate, it makes a sound with its propeller, simulating a laser shot, and the player has to parry it by putting the light saber in front of her/his chest.

The drone is intrinsically adapting to the style of movement of the player: a more active player will have to face a more active trainer.