Any non deterministic algorithm needs a dataset to prove its efficiency.  There were many options available, as far as the dataset selection was concerned. Again however, being against the logic that we should train something to work well with some provided data when we have to put that something in a real-time application (this might seem a conceited opinion to many), I moved on to create my own datasets.  To enable their expansion by any user, I have also created a GUI application that processes Kinect rosbags, extracts the requested channel (this is hardcoded in the configuration if anyone wants to play with, but usually depth channel has a specific messages name, if it comes from kinect2bridge) , preprocesses the found frames and saves them in a location in the disk (check config.yaml) , accessible by the system to be trained.  Simultaneously, it allows the user to provide the ground truth of each video stream.

A screenshot of the data mining application, publicly available in my Github Page

By using this application, I created a dataset of 11 gestures, 4 static ones and 7 dynamic ones. The static gestures are:

  • Palm
  • Index
  • Tiger
  • Punch

The dynamic gestures are:

  • Punchflip in
  • Punchflip out
  • Handflip in
  • Handflip out
  • Fingerwave in
  • Fingerwave out
  • Pinch

With this base I created 11 different sets for training, each enclosing a specific action, 3 sets for validation with overlapping static and dynamic actions and 1 difficult testing set. Below, one can view examples of the actions occurrences in one of the validation videos. Each validation set was produced by changing in a small degree the field of view of the sensor.  All sets were recorded from a single user. (me..)

Gestures occurrences in one of the validation videos.

Any user has the ability to add more sets to the training data, to enhance it with either new gestures or add more samples to existing ones. The only required materials are a Kinect and the interface needed to gather data from it (described in README). The used dataset can be downloaded from here. The images will appear black, due to the way they are read from the system, as their intensity is in the range [0-12000]. One has to load them from a script, so that information can be read. The Python function used to read all the data can be found here .

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s