YouTube 360 video formatWritten by Paul Bourke
In the following the internal format YouTube uses to store 360 video content will be explained. As with many documents in the technology space this may be out of date in the future as YouTube may choose to change the way they store 360 videos.
When 360 video, monoscopic or stereoscopic, is uploaded to YouTube it is generally in the equirectangular format. This is the default format created by the vaste majority of software provided by the camera manufactures, and others. YouTube does not retain this format but instead remaps the footage. If you subsequently download the footage then it appears in that new remapped format. For example, here is a single frame from a downloaded 4K YouTube video.
Same from "Elephants on the Brink", YouTube Discovery channel.
While one might be tempted to think this is two partial panoramas, it is in fact the 6 faces of the conventional cube map. The layout is slightly cunning in that it forms two strips, upper and lower half of the image. The upper strip contains faces left-front-right and the bottom strip contains faces bottom-back-top, noting that the face names can vary depending on conventions. This is essentially splitting the cube into two halves and laying each flat.
The discussion here is for the YouTube "4K" format, the other aspect ratios are just variations on this theme. Similarly knowing how this works should make it straightforward to work out what is happening in the stereoscopic case.
In any pipeline to reconstruct the equirectangular one is likely to extract each face, rotate according to local conventions (especially the orientation of the top and bottom faces), scale to create square images, and then run through a cube to equirectangular converter. The faces extracted, rotated and scaled are shown below. The reader should be able to determine which face came from where.
Converting downloaded YouTube movies back to equirectangular can be readily scripted. The process might be to use ffmpeg to extract the frames. ImageMagick "convert" to extract the 6 cube faces, apply something like cube2sphere to turn the cubemaps faces into equirectangular, and then finally building the movie again using ffmpeg, reassigning the audio track. Of course the result will not be as good as the original due to these multiple image manipulation steps, multiple encodings and the extreme compression YouTube performs.
The final reconstructed equirectangular is shown below, noting that the equiangle version of the cube map projection is used.
For example, the ImageMagick "convert" command lines for MacOS or Linux to extract the 6 cube maps from the YouTube frames might be as follows
convert -crop 1280x1024+0+0 $1 -flip -resize 1280x1280\! frame_l.tga convert -crop 1280x1024+1280+0 $1 -flip -resize 1280x1280\! frame_f.tga convert -crop 1280x1024+2560+0 $1 -flip -resize 1280x1280\! frame_r.tga convert -crop 1280x1024+0+1024 $1 -flip -rotate -90 -resize 1280x1280\! frame_d.tga convert -crop 1280x1024+1280+1024 $1 -flip -rotate 90 -resize 1280x1280\! frame_b.tga convert -crop 1280x1024+2560+1024 $1 -flip -rotate -90 -resize 1280x1280\! frame_t.tgaNotes