Initial tests with the Kinect

Written by Paul Bourke
December 2010

The following is a brief report on the evaluation of the Kinect as a user-computer input device, examples and notes are given that may be helpful for others starting out in similar projects. The libraries used are those provided on, namely, libfreenect. This history of these is relatively recent (at the time of writing), the open sourcing of the code from Hector Mart occurred only 6 weeks ago.

Kinect, technology licensed from Primesense


One can request two "images" from the Kinect, one a standard image from a normal camera and the other the depth map derived by a infra-red structured light emitter and camera. The IR emitter is the left most circle (when viewed from the front), the central circle is the normal camera, and the rightmost circle is the IR camera. The resolution of the camera image is 640x480 at 30fps (there is a 1280x1024 mode that runs at 15 fps. There are different formats the image data can be supplied as, for example: standard RGB, raw Bayer, or YUV.

The depth image is also supplied as 640x480 at 30fps, there is also a high resolution mode (1280x1024) at a lower frame rate, around 10fps if the high resolution camera data is also being acquired. The maximum dynamic range is 11 bit (2048 states). There is also control over the tilt of the Kinect, the appearance of the single LED light, and one can receive data from the accelerometer. These will not be discussed here.

Single pointer example

The first stage of any camera tracking is often background subtraction. With the Kinect the depth image makes this trivial and not susceptible to changes in lighting. In the following the algorithm is very simple, choose a depth (foreground hand in this example) and calculate the depth values within some tolerance of that depth (red), take a running average for a smooth tracked result (green ellipse).

Click for movie

Two hand driver example

The following is a simple angular interface using two hands, estimating their relative positions and deriving the angle from the horizon. An important feature of both these is how stable the derived statistics are and the minimal latency, the capture runs at 25 fps.

Click for movie