pano_align: a tool for aligning omnidirectional stereoscopic panoramas

Written by Paul Bourke
May 2025

Stereoscopic 3D panoramas (strictly speaking, omnidirectional stereoscopic cylindrical panoramas, see appendix) for this project have been captured using two different camera rigs. The first is the Roundshot by Seitz, a film camera, and the second is a dual digital camera rig based upon the Lumix GH5 camera. In both cases the system functions as a line scan camera. In the case of the Roundshot there are two rolls of 70mm film, the shutter opens, the two camera heads rotate 360 degrees while the film is exposed through a narrow slit. In the digital case the camera rotates 360 degrees while in movie record mode, in post processing a narrow vertical slit is extracted from each frame of video and abutted together. In both cases there are two images generated, one intended for the left eye and one for the right eye.


Seitz Roundshot camera (left), GH5 digital rig (right)

The majority of panoramas for this project were captured using the Roundshot. The two rolls of film need to be developed and then scanned, either a drum scan or flat bed scan. Due to practical realities of these processes, the following effects can occur and need to be corrected.

The two panoramas can be slightly different lengths, generally due to stretching of the film during recording or drum scanning.
The two panoramas will generally be rotated with respect to each other, most commonly due to the manual placement of the film rolls on the scanner.
The panoramas need to be horizontally shifted (rotated about their vertical axis of the panorama) with respect to each other to create the desired zero parallax distance.
Both cameras record more than 360 degrees, the exact 0 to 360 segment needs to be identified and cropped with blending across the seam such so that there is no discontinuity across the left and right edge. Both cameras were configured so that they recorded between 400 and 420 degrees.
The images need to be cropped vertically to meet the needs of the desired vertical field of the presentation system, in this case a 360 degree cylindrical display with a 53 degree vertical field of view. The native resolution being 12816x2048 pixels.

The task 3 onwards also need to be performed on the panoramas captured by the digital rig.

There are sufficient interrelated parameters that performing the corrections manually in Photoshop would be both extremely difficult and time consuming. An example of the input images to the alignment process and the output images are illustrated in figure 1 and 2.

Figure 1. illustration of the transformations (exaggerated) that need to be corrected for.

Figure 2. The results after the panorama alignment process.

The panorama alignment is performed by a custom C/C++ program written for Linux and MacOS, although porting to MSWindows would be straightforward. It is a command line program (no graphical user interface) called "pano_align" and the following is the command line usage string.

Usage: pano_align [options] parameterfile
Options:
   -a n    antialias level, default: 2
   -e n    width of edge blend zone, default: 100
   -w n    width of final image, default: autodetermined
   -h n    height of the final image, default: autodetermined
   -z      adjust zero parallax, default: on
   -d      debug mode, default: off

The key is the parameter file, an example is shown below.

   77_dnATL_Burma_S1-1L7.tif
   7377 3475
   385 2705 
   17310 2684
   77_dnATL_Burma_S1-1R7.tif
   7263 3553
   305 2752
   17239 2786
   -14 -8 -3

The first and fifth lines contain the filenames for the left and right eye panorama respectively. These would typically be 16 bit tiff files directly from the scanner (or digital camera rig), or after a clean up process that might remove dust, hair and other defects on the film strip. The colour correction/grading can be performed at any stage but typically it is better to perform after the alignment so as to manage any colour profiles of the final intended display system. "pano_align" is purely a pixel shuffling process, no colour management is performed and 16 bit pixel RGB values are unchanged between the input and the output.

The three lines after these file names correspond to three coordinate positions, measured in pixels, on the respective panorama images. The first line is the location of a visible feature on the left of the image, the third line is the location of the same feature on the right of the image (in the overlap region). The second line is where zero parallax distance (also sometimes called the convergence distance) will be located. These 3 features so identified need to be the same between the two left and right eye panorama. These features are identified and recorded in the parameter file using Adobe PhotoShop, although almost any image painting/editing software could be used. An example using PhotoShop is shown in figure 3, the guides are placed so their intersection is on the chosen features. The rectangular selection tool is used to snap onto those intersections and the values in pixels are read from the "Info" panel.

The zero parallax position is the depth in the image at which a feature position is at the same horizontal position in both panorama images. There are a few strategies for choosing the zero parallax distance. In the case of a head mounted display the zero parallax distance is at infinity, or at least the furthest object from the camera. For screen based stereoscopy the zero parallax position should be at an object that is the same distance away in the photographed scene as the viewing screen, see figure 4. For example, for an 8m diameter display cylinder the zero parallax position would be on an object in the scene that is 4m away. In either case the zero parallax position can be adjusted in post or live in the viewing software by rotating one panorama about a vertical axis with respect to the other. This is a unique feature of an omnidirectional stereoscopic panorama.

Figure 3. Guides used to measure the 3 matching positions in each panorama.

Figure 4. Parallax relationships for screen based stereoscopic viewing.

The last three digits in the parameter file are intended to correct for a camera that wasn’t perfectly level. Such a case results in vertical parallax that can vary in degree across the width of the panorama. While it is normally a modest effect (a few tens of pixels) it does in general induce an undesirable degree of eye strain. The three numbers correspond to the degree of warping to apply to correct the vertical parallax at positions 25%, 50% and 75% of the width. A correction at the left (and right) edge is not necessary since the alignment procedure ensures that location will have no vertical parallax.

Figure 5. Vertical parallax correction locations.

Note that this is only a first order linear correction for what is normally a slowly varying vertical parallax variation, but it is sufficient to reduce any vertical parallax down to less than a few pixels. A convenient feature in the image need not lie exactly at the positions shown. The procedure for measuring the vertical offset at the 3 positions is typically by overlaying the processed panoramas with the top most at 50% transparency, as shown in figure 5. The vertical parallax shift is measured between a convenient feature in each, +'ve value warps the left eye upwards at that point, a -'ve value warps the left eye downwards at that point.

Miscellaneous notes on “pano_align”

“pano_align” performs the image transformations in reverse. That is, it considers each pixel in the destination image and determines the best pixel from the source image. As such it is readily parallisable since each destination pixel is considered independently of other pixels. Using the GPU or multiple threads will increase performance linearly with the number of available cores.
As a digital sampling process “pano_align” is subject to aliasing effects. This is mitigated by employing supersampling antialiasing. A grid based supersampling is used, rather than stochastic. For most transformations a 2x2 supersampling (see -a command line option) is adequate. If the source images are of high fidelity or there is a large difference in resolution between the source and destination then a 3x3 supersampling may be necessary.
The transformation performance is dependent solely on the number of pixels in the destination image. For example, creating a destination that is two times larger will incur a 4 times processing cost. Antialiasing essentially increases the number of destination pixels computed so a 2x2 antialiasing will take 4 times longer than 1x1 (no antialising), and 3x3 will take 9 times longer.
The default of 100 pixels for the overlap blending (-e command line option) is generally considered sufficient.
The -h command line option will crop the panorama to a specified height. It does this symmetrically which isn’t always the desirable cropping. As such the cropping is normally not performed by “pano_align” and instead performed at a later stage.
For a cylindrical display one may wish to crop the panorama to exactly match the native resolution of the display. Given a native resolution of the panorama width W in pixels, and the vertical field of view θ_v then the height H in pixels is given by

The vertical field of view is be calculated from the dimensions of the physical cylindrical screen, namely

Where h is the physical height of the cylinder and r the radius.

Appendix: ODSP (OmniDirectional Stereoscopic Panoramas)

Omnidirectional stereoscopic panoramas (ODSP) is the term given to a pair of locally correct stereoscopic panoramas spanning 360 degrees in longitude. "Locally correct" because if a limited horizontal field of view of the ODSP is presented to a viewer, there are minimal perceived stereoscopic artefacts irrespective of the part of the panorama being viewed. This is in contrast to more traditional stereoscopic image pairs that require knowledge of the viewers position and viewing direction to be correct.

Perfect ODSPs of synthetic worlds can be created in computer rendering software. ODPSs can be captured photographically by employing a pair of cameras, offset from their center, that rotate about that center. In the case of the Sietz camera the film is exposed continuously as the rig rotates, this results in a perfect ODSP. In the case of a digital rig narrow slits of some finite width are extracted from each frame of the video as the camera rig rotates. As the slits get narrower and narrower it approaches the perfect ODSP.

Figure 6. Rotating camera rig to capture photographic ODSP.

In practical terms this means that an ODSP can be presented in, say, a 360 degree cylinder containing multiple observers all potentially looking in different directions. Similarly an ODSP can be experienced within a virtual reality (VR) headset with no view dependent computation other than selecting the correct portion of the panorama image pair as the viewer turns their head. In contrast, for most synthetic VR environments the exact view presented to each eye needs to be computed for any view direction.

The theory behind the ODSP was variably introduced in the 1990s by Ishiguro et al and various camera and software designs published by Peleg. Employing an ODSP provides for the presentation of stereoscopic photographic imagery while minimising departures from the exact image pairs that should be presented to each eye. There are two sources of error, the first arises when the viewer is not located in the same position in relation to the viewing apparatus as where the ODSP was captured. For example if the viewer is not located in the center of a cylindrical display environment, or in the context of a VR headset the viewer is not located in the center of the virtual cylinder on which the ODSP is the texture map. The second error is the divergence from the ideal stereoscopic image pairs from their respective vertical centers. That is, the stereoscopic perception is perfectly correct in the center of the view direction and gets increasingly distorted towards the left and right edge of the field of view. Fortunately the effect of this error is rarely an issue. One reason is that the glasses being employed in stereoscopic systems typically limit the horizontal field of view to about 60 degrees. While this may seem like an impediment to immersion through peripheral vision, our depth perception is limited naturally due to occlusion by our nose, and thin frame stereoscopic glasses can still provide peripheral vision in the far field outside the frame of the glasses. Another reason for minimal impact of the stereoscopic error with angle is that humans naturally fixate and align their heads with their view direction.