Today I am continuing the analysis of the next stages of the algorithm, as it was described in Part 0 .

Skeleton Computation

After having exracted the background mask of the hand, we faced the difficulty that we did not know exactly what is the endpoint of the recognized hand, ie. the point on which the drawing operations will take place. Apart from that, the size of the produced mask had to be decreased, so that  the hand (the actual area of interest) would be cut from the rest of the arm.  In order to provide an acceptable solution to this problem, I came up with a Fast Skeletonization Algorithm, which is characterized by simplicity and speed. The basic idea was to extract a single line that begins from the middle of the entry point of the arm, joins approximately all its visible basic joints and ends on the hand point that is most likely to be used to draw. The root mechanism that can provide such a behavior is the following procedure, which finds the longest non intersecting line, given a starting point, inside a contour: FLLIC . By applying iteratively this procedure with careful handling after each iteration, we can deduce the wanted main arm skeleton. This procedure can only be used when there is absence of the both following conditions, issued simultaneously:

  • The entangled joints include variation in the sign of the angle of the links relative rotation, based on the Denavit-Hartenberg model.
  • Two consecutive links with opposite joint rotation are equally sized.

Due to the geometric nature of the hand and its degrees of freedom, the above state is highly abnormal, thus the proposed procedure can be successfully applied. However, when there is noise overlap or the user holds an object during the drawing operation, this procedure might fail to compute the main arm skeleton.

The way with which FLLIC is applied iteratively can be viewed in the Python Code in my Github Page . Basically, we set some conditions of termination, relatively to the contour points left to be calculated, the number of joints found and the width of the last extracted link. When the point matching the longest line is extracted, it is transfered in the center of the straight linear segment, which is constructed by the found point and the matching point in the opposite point of the link. The matching point is assumed to be the one which is equidistant from the initial given point. The contour points belonging to the curves joining the found and the matching points with the given one are removed from the next iteration, while the transfered point becomes the new given one. During the aforementioned process, if the found link (the one joining the given point with the transfered found one) is too short, then it is merged to the previous link, so the ending skeleton can actually intersect with the arm contour. There are some constants that have been set heuristically in the whole process, which however comply with the generic hand model properties and therefore do not need fine tuning for different users. Unfortunately, when performing this operation in other applications,  such hyperparameter optimization might be needed.

Extracted simple skeleton from the samples, using Fast Skeletonization Algorithm.

Hand Mask Extraction

The Hand Mask is extracted by examining the last link found. We find the optimal rotated minimum area bounding box needed to engulf all the link contour points using the  tool minAreaRect from OpenCV library. By keeping as is the perpendicular rectangle edge to the found link, which is closer to the endpoint, and expanding this edge longitudinally to the link ,relatively to the edge’s length, we receive the approximation of the hand mask. This approximation is by construction depending on the edge’s size, thus different poses of the human hand will produce a different mask size. In other words, although this is not the best hand approximation, it aids with the next stage of gestures’ classification.

In the following video, we can examine the whole algorithmic process until this stage. Each frame requires less than 17ms to be processed, with the skeletonization being the most time demanding (~13ms). Those times were calculated using a Python Implementation (apart from the precompiled C++ OpenCV code) and a single threaded, single computer CPU core, so no hardware or compiler optimization was performed. Other implementations are sure to require less time to operate.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s