Stereoscopic filming

Achieving an accurate sense of depth and scale

Written by Paul Bourke
Oct 2008

In the following I discuss, by way of an example, the process of filming stereoscopically and preparing the resulting material for viewing. Of particular emphasis for this application was to achieve a true sense of scale and depth of the filmed material, this requires a certain rigour not normally required for many applications that require stereoscopic filming. Please note that there are alternative ways of achieving the same results outlined here, this serves primarily as documentation of what needs to be achieved and the work-flow employed by the author on a particular stereoscopic filming project and using the Final Cut Pro editing software.

Key geometric considerations

The key to achieving a correct sense of scale and depth in any stereoscopic content is to match the viewing geometry with the camera geometry. For content that is world scale and observed by a human, this means matching the frustums of the recording cameras to the frustums (one for each eye) of the observer to the eventual stereoscopic projection environment. The key parameters are: the interocular distance of the human viewer, the geometry of the viewer with respect to the viewing screen (assuming a single flat display here), and the field of view the viewer sees through the display rectangle.

Only by careful matching the recording geometry with the viewing geometry can a correct sense of scale and depth be achieved. Similarly, when viewing any stereoscopic content the correct or intended depth and scale is only correct when viewed from a single position, all other positions involve some distortion. As one moves towards or away from the correct position a depth error occurs (depth is stretched or compressed) and a scale error occurs mainly arising from field of view changes. As one moves left-right-up-down from the correct viewing position the error involves a shearing of space. To illustrate this consider stereo footage captured for an observer at position 1, if the observer moves to position 2 the depth in the scene will appear to compress in depth.

Note that in realtime generation of stereoscopic pairs the above problem can be solved with head tracking, ensuring the correct frustums are always created. With filmed material this luxury does not exist. The distortion effects mentioned above are easy to verify in a non-head tracked stereoscopic system by moving around and observing the shearing effects that result.

Camera rig

As discussed above ideally the cameras would be separated by human eye separation, namely 6.5cm (on average for an adult). The cameras are aligned parallel, this is the correct approach rather than toe-in cameras. Desirable characteristics of the camera rig include good levelling and the ability to be able to lock the cameras to a base that moves along a track in such a way that the cameras can be moved and replaced without destroying alignment (generally best performed in a controlled environment rather than in the field). The cameras at least need to be slid along the railing in order to access the display of the right camera for zoom setting (see later).

The video cameras used here are HD resolution, namely 1080p. Even though the playback system is a 4x3 aspect, 16x9 aspect for filming has an important advantage, namely it provides plenty of horizontal movement for sliding the image in order to align zero parallax correctly (see later). The key requirement for the bar that holds the cameras is that it has a good levelling bubble, the tripod needs to be relatively stable in order for this levelling to be persistent across the filming session.

The cameras need to be aligned parallel to each other, one way to do this is to film a test grid pattern. If the horizontal and vertical lines are parallel to the image frame in both cameras and parallel to each other then the cameras are both perpendicular to the wall and parallel to each other.

A last note on cameras ... colour and brightness matching between cameras is important, in addition it should be noted that cameras can have their sensors positioned slightly differently with respect to the lens. This can be observed by noting the vertical offset of the horizontal lines in the above image. This matching can best be achieved by finding the best pair of cameras from a collection of 3 or 4 cameras, assuming you have a friendly camera shop. Before shooting a manual white balance should be performed with both cameras. While CCD offsets and small zoom differences can be compensated for in post production (see below), colour differences are significantly more time consuming and problematic to fix.

Note also the barrel distortion in the above vertical and horizontal alignment lines. This is a natural (normal) attribute of a lens, if necessary it can be corrected for in post production. Such correction is quite common if the content is going to be composited with computer generated material, it won't be considered here.

Configuring FinalCutPro

While one may initially create a video capture sequence that matches the camera specifications, the editing for the final display should be performed at least at the aspect ratio of the final stereoscopic display system, in this case 4x3. The resolution of the final stereo footage may be chosen higher in order to support higher resolution displays, in this case it is set to the native resolution of the projectors being employed, namely XGA. This footage will for ever remain in the progressive digital domain and while more lossy codecs may be used in the future, for the initial cut an essentially lossless compression is chosen (Photo-JPEG). For this project a key-frame (time independent) codec is also required because the footage needs to be precisely positioned and paused. The key sequence settings for FinalCutPro are shown below.

Alignment in time

The first process is to align the two streams in time, this arises because gen-locking the cameras usually isn't supported with small commodity video cameras. Given a 25p capture using the cameras used here, the two streams can be aligned to within half a frame, namely 1/50 second, this has proven adequate to date. What one looks for is a sharp event in time, either something that occurs naturally in the content or by the use of a clapper board at the start of each filming run. In the following case the fast action of swatting a fly was used as the alignment timing event. Due this time alignment some post processing saving can be incurred by filming long continuous sessions rather than many shorter clips, each of which then requires a separate time alignment process.

Alignment of images in the plane

As discussed above, in order to achieve a correct sense of scale and depth the camera needs to be placed, in relation to the screen, the same as the viewers relationship to the screen. This not only locks in the distance to zero parallax but also the field of view of the camera, which should match the frustum from the eventual viewers eyes to the corners of the stereo screen frame. To facilitate this a frame is built with crossbars that match the height of the lower and upper edge of the eventual viewing screen. This frame is videoed by itself for a short period before each stereo filming session. The next stage of the processing of the two streams is to scale and translate the streams such that the horizontal crossbars fill the field of view and translate the streams horizontally so that the bar is at zero parallax. After this process, all pairs of objects should only exhibit horizontal parallax, that is, no vertical parallax.

Export of streams and combining them in QuickTime Pro

The stereoscopic playback system used in this exercise is fairly standard for polaroid based stereo based upon two projectors and the footage is played back as standard movies of either double the width or double the height. In the example here double width movie frames will be created, the left eye image on the left half of the frame and right eye image on the right half.

This has the advantage of being played back using standard QuickTime Pro on a double width display created with something like the Matrox dual-head-2-go. Or using software such as warpplayer to play across dual displays formed with a dual head graphics card (note that warpplayer also has improved performance over QuickTime Pro and has the ability to support a software alignment). The choice here is to export the left and right eye streams individually and combine them into the correct format for playback as a separate exercise.

An alternative approach is to position the two streams side by side in FCP in a new sequence with twice the width of each eye stream. However this requires setting up exact crops horizontally on each stream. Much simpler is to save each stream separately and combine them using QuickTime Pro and the "offset" option after coping and "Add to movie" (Edit menu) of one stream into the other. Keeping the two streams separate and saving them as the master copies generally makes creating formats for alternative projection solutions in the future easier. For example, in some cases higher performance playback can be achieved with a top/bottom arrangement of the two streams.


The system employed here is a linear polaroid based projection system. In reality from a content creation perspective it matters little whether it is linear polaroid, circular polaroid, shutter glasses, or Infitec. If all has worked correctly the view through the stereo projection window will appear identical to the view through a similar rectangle in the real world filming environment. The distance to moderately distant objects is one depth cue that is easy to judge. Another key depth cue for testing the success is the ground plane, it should appear consistent with the real floor in front of the projection screen.

Small scale stereoscopic photography

Written by Paul Bourke
September 2009

The following documents an approach to creating stereoscopic photographs and video of small scale objects, for example, live insects. The trick is achieving the small camera separation which is proportional to the size of the object being photographed. In order to take stereoscopic photographs of an object on the scale of a cm, one needs a camera separation on the order of 1/2mm, or less. While this can be achieved for stationary objects with a single camera offset at two different times, this approach is not acceptable for moving objects.

The solution here is to use a beam splitter to essentially fold the light path for one camera. The cameras are now at right angles to each other and the view frustums can be moved arbitrarily close to each other. A technique is also required to capture the images at the same time, in this case the LANC interface supported by older Sony cameras is used. A stereo LANC controller provides synchronised images within 3 or 4 ms. Unfortunately as of 2009 Sony seems to have discontinued this support in their current range of digital cameras, the solution would seem to reside with Canon cameras and the CHDK (Canon-hack development kit).

The images above show the very basic prototype: two cameras, beam splitter at 45 degrees, lanc controller, cover to ensure the second camera only receives light reflected from the beam splitter.


  • The beam splitter is 50% symmetric transmission/refection, other ratios exist and would clearly result in unmatched intensities between the stereo pairs.

  • Cameras with good macro lens support are obviously desirable.