Report on 360 video stitching of Obsidian Pro video for cylindrical display

Paul Bourke
July 2025

The following describes the processes and algorithms developed to convert 360 videos from the Kandao Obsidian Pro camera to a format suited to the eventual stereoscopic cylindrical LED display. It involves the use of commercial software, namely MistikaVR, and locally developed tools to extract cylindrical panoramas from the equirectangular panoramas.

360 video recording

The Kandao Obsidian Pro notionally captures “12K” 360 video at 30fps. It does this by running 8 separate video cameras each with a sensor resolution of 6016x4022 pixels and a fisheye lens extending 200 degrees vertically. It has a 60fps recording mode but this simply halves the vertical resolution of each video camera by a crude binning process following by a digital upscaling. This was deemed unacceptable for the current application because the destination cylindrical display can take advantage of the full 12K resolution.

Figure 1: 8 fisheye images from Kandao Obsidian Pro camera

The Kandao Obsidian Pro was chosen since, at the time, it was the highest resolution 360 stereoscopic capable camera on the market. Besides the low 30fps frame rate it has another unfortunate characteristic, namely a large interocular, also called the interpupillary distance (IPD), of 12.5cm. For true scale and depth relationships for human scale recordings one would prefer an interocular closer to 6.5cm. This large interocular places limits on the depth budget and desirable depth relationships in the cylindrical display. Specifically, a zero parallax at 4m (diameter of cylinder) coupled with the need to avoid greater than 6.5cm parallax for distant objects as this requires that the viewers eyes diverge, not a natural state for the human visual system. The wide interocular also affects the closest objects can approach the camera and still result in a successful stitch.

Stitching

The stitching is performed in the MistikaVR software, the recognised industry leader. The 8 video streams, one from each camera on the Obsidian Pro, are stitched to a left and right eye stereo pair, each stereo pair being an equirectangular projection, example shown in figure 3. Strictly speaking the output from MistikaVR is an approximation to an Omni-Directional Stereoscopic Panorama (ODSP). Such a stereoscopic image is unique in that it can deliver depth perception in the cylindrical display for multiple viewers, each potentially looking in different directions.

The resolution of each panorama pair is chosen to be the native pixel width of the cylindrical display, namely 12816 pixels. Since MistikaVR exports a full equirectangular projection the height is 6408 pixels.

Kandao provides a camera specific parameter file (XML format) describing the physical and optical parameters of the camera, this is a key advantage of this particular 360 camera.

Figure 2: Key MistikaVR settings

In order to facilitate future processing and to avoid compounding lossy compression, the stitched frames are exported from MistikaVR as tiff images.

Figure 3: Left (top) and right (bottom) eye equirectangular projections.

The most important parameters for the stitch are shown in figure 2. The optical flow parameters are highly dependent on the scene content and the movement therein, the parameters are chosen after analysing the results from a range of settings and choosing the ones that gave best results. A key feature of MistikaVR is the ability to adjust the location of the blend zones, this can facilitate stitching quality by placing the blend seam in unimportant regions, for example, regions without dancers. Unfortunately in this application adjusting the location of the blend zones is limited due to the dancers moving around the camera.

A key stitching setting is the choice of convergence (zero parallax). The choice of convergence is dictated both by the intended zero parallax distance in the cylinder (4m) and the improved stitching that occurs around the convergence distance (zero parallax distance). The later was given priority given that the final zero parallax can be trivially adjusted in post by panning one panorama with respect to the other, with wrapping across the left and right edge. It can even be performed live during playback.

The anaglyph display in MistikaVR is the most convenient way of determining the zero parallax distance. The zero parallax distance is identified by noting when the red and blue representations for each eye align. In the recordings performed on the Helipad the yellow circle is known to be at 4m, the same as the cylinder display radius. A further safety check is to ensure that distant objects are not separated by more than the eye separation. Given that the cylindrical display has a 2mm pixel pitch, then the distant objects should not be separated by more than 33 pixels (65mm interocular / 2mm pixel pitch).

Figure 4. Anaglyph display to judge zero parallax distance.

Cylindrical panorama #1

Unlike a head mounted display, the cylindrical display requires not equirectangular projections but rather cylindrical panorama projections. Like an equirectangular projection, the horizontal axis of a cylindrical panorama is equal steps of longitude. However, the vertical axis resembles a perspective projection.

These cylindrical panoramas are created at the native resolution of the cylinder, 12816 pixels by 2048 pixels (assuming square pixels). The vertical field of view matches that of the physical cylindrical display with height (H) and width (W), specifically 53.325 degrees.

The mapping from pixel index (i,j) to longitude (ranging from -π to π) and latitude (ranging from -fov/2 and fov/2) is given by

The unit vector into the scene P(x,y,z) is given by

The reason for creating this vector, rather than directly mapping into equirectangular, is that rotations (pan, roll, tilt) can be applied. For example rotating about the vertical axis (z) to pan the panorama to adjust the “front” position.

Given a transformed P(x,y,z) one can calculate the longitude and latitude as

The mapping from longitude (ranges from -π to π) and latitude (ranges from -π/2 to π/2) to pixel index (i,j) in the equirectangular projection is simply as follows.

The following coordinate system conventions and employed: “z” is up, “x” is to the right and “y” is forward, this is a right hand coordinate system.

Software was developed to extract the cylindrical panorama from an equirectangular as per the mappings above, it is called sphere2pano. It performs supersampling antialising to avoid aliasing effects that can arise from any discrete sampling processes such as this. The command line for this utility is as follows.

sphere2pano [options] sphericalimagename
Options:
   -v n     vertical FOV of panorama, 0...180 (default: 60)
   -la n n  latitude range in degrees, negative to positive angle, default: -30 to 30
   -lo n n  longitude range in degrees, default: -180 to 180
   -a n     set antialias level, 1, 2 or 3 typical (default: 2)
   -sc n    clip +/- this longitude, default: disabled
   -180     input is only 180 degrees of longitude, default: off
   -w n     width of the output panorama image (default: 2048)
   -o s     output file name (default: determined internally)
   -x n     tilt angle (degrees), default: 0
   -y n     roll angle (degrees), default: 0
   -z n     pan angle (degrees), default: 0
   -l       flip in longitude (default: off)
   -f       create remap filters for ffmpeg (default: off)
   -d       enable debug debugging mode

This mapping process is performed in reverse, that is, each pixel in the destination image (cylindrical panorama) is considered and the best pixel and RGB value is found in the source image (equirectangular image). Since each pixel is calculated independently of the others this process is trivially parallel and the performance will scale with the number of processes/cores. However, since each image can also be converted independently a much simpler parallel processing scheme is employed. Namely, 20 sphere2pano processes are run in parallel, 10 for the left eye and 10 for the right eye. This results in almost perfect load balancing on a 20 core M4 processor and only requires simple scripts to distribute the processes. For every frame the following is the form of the command line.

   sphere2pano -w 12816 -v 53.325 -o cylleft53/%05d.tif // reference to exported from MistikaVR images
   sphere2pano -w 12816 -v 80 -o cylleft80/%05d.tif // reference to exported from MistikaVR images
   sphere2pano -w 12816 -v 53.325 -o cylright53/%05d.tif // reference to exported from MistikaVR images
   sphere2pano -w 12816 -v 80 -o cylright80/%05d.tif // reference to exported from MistikaVR images

The pair of cylindrical panoramas with 53.325 degrees FOV are appended together vertically to create a single image ready for immediate preview on the display. The left eye panorama is on the top and the right eye panorama is on the bottom. The overall image size is therefore 12816 pixels by 4096 pixels.

Figure 5: Top/bottom pair as cylindrical panoramas each with 53.325 vertical field of view.

These frames are built into using ffmpeg commands as follows.

   ffmpeg -y -r 30 -i cylleft53/%05d.tif -c:v libx264 left53.mp4
   ffmpeg -y -r 30 -i cylright53/%05d.tif -c:v libx264 right53.mp4
   ffmpeg -y -i left.mp4 -i right.mp4 -filter_complex vstack 152557_topbottom_53deg.mp4

For simplicity the above assumes the default encoding by ffmpeg, of course a particular encoding could be added, generally a crf value of 20 (-crf 2).

Cylindrical panorama #2

A second pair of cylindrical panoramas are created for further post production. These are formed with a vertical field of view of 80 degrees in order to provide scope for vertical adjustments. The resolution of these is 12816 pixels by 3422 pixels.

Figure 6: Left (top) and right (bottom) eye cylindrical projections at 80 degrees vertical field of view.

As the final deliverable for post production (colour grading and upscaling to 60fps) the pipeline is shifted to the ProRes codec and also presented as such after the equirectangular to cylindrical processing. The ProRes setting used within ffmpeg are as follows, noting that profile 3 refers to ProRes HQ.

   ffmpeg -y -r 30 -i cylleft80/%05d.jpg -c:v prores_ks -profile:v 3 -pix_fmt yuv422p10le left80.mov
   ffmpeg -y -r 30 -i cylright80/%05d.jpg -c:v prores_ks -profile:v 3 -pix_fmt yuv422p10le right80.mov

Appendix 1: Remap filters

The mapping from equirectangular projections to cylindrical panorama projections using sphere2pano is time consuming and uses large storage resources because it requires the conversion of the movies to frames. A less demanding approach is to use remap filters in ffmpeg, the generation of these filters is performed by sphere2pano using the -f command line option.

Remap filters are two text files linking each pixel in the destination image with the corresponding pixel in the source image. There are two files because one is the mapping for the horizontal axis and the other is the mapping for the vertical axis. The files are stored as PGM image files.

Using remap filters x.pgm and y.pgm say, the ffmpeg command line might be

   ffmpeg -i sourcemovie -i x.pgm -i y.pgm -lavfi remap destinationmovieo

Remap files encapsulate all aspects of the mapping, in particular, the destination image resolution and the source image resolution fixed for a particular pair of remap filters. If any mapping parameter is to be changed, new remap filter files need to be generated.

Remap filters are a fast way of converting the equirectangular movies directly to cylindrical panorama movies but they don’t perform any antialiasing so the image quality is compromised. There is a solution to this by creating remap filters to create a double size movies, but due to the large size of the movies this is problematic.

Appendix 2: Camera orientation

Given that perfect stitching across the overlaps between the cameras is not always possible, one might orientate the camera such that important subjects are not on the seams. As it happens, this camera orientation is different if one is planning to create a monoscopic or stereoscopic result. For monoscopic outputs the zones that don’t require stitching are immediately opposite each camera lens. For stereoscopic outputs the zones that don’t require stitching are located between the camera lenses. This is illustrated in the following, the red outlines are the zones that don’t involve blending, monoscopic case first followed by stereoscopic case. The difference for the stereoscopic case is that it uses the left and right portions of the fisheye image to create the stereo pairs, whereas for monoscopic stitching the center portion of the lenses are used.

Figure 7. Zones not requiring stitching, monoscopic and stereoscopic case.