The following outlines a method of distributing an OpenGL model across a number of computers each with their own OpenGL rendering card. The OpenGL model might be distributed using MPI from a controlling machine (that need not have OpenGL capabilities). Each of the slaves render a subset of the geometry and send their draw buffers and depth buffers back to the controlling machine. These images are combined according to the values in their depth buffers.
While tremendous performance improvements (especially price performance) are being made in OpenGL cards, there will always be geometric models that bring the best card to it's knees. For example, a high performance card might be able to render 15 million triangular facets per second. For interactive rates of 20 frames per second this means that one can display geometry with 750,000 polygons, if one wishes to render in stereo that drops to models with 375,000 triangular polygons. While this might seem a large number of polygons for those in the virtual reality or games market, it is a relatively low polygon count for many scientific visualisation applications. As an example, the terrain model shown below of Mars contains over 11 million triangular polygons.
A solution that follows the trends in cluster computing is to distribute the OpenGL rendering load among a number of machines. Fortunately, OpenGL maintains a depth buffer and this buffer is accessible to the programmer using glReadPixels(). The idea then is to split up the geometry making up the scene and distribute each piece to one OpenGL card, generally each card will be in a separate computer. Each card will then only need to render a subset of the geometry. For example, for a terrain model it is quite easy to split the polygons up evenly (the splitting of the geometry so that each OpenGL card is evenly loaded is not always trivial) so if there are N machines, each one only handles 1/N of the total number of polygons.
Each OpenGL card then renders it's portion of the geometry and sends the image and depth buffer to be merged into the final image. The logic for this is straightforward, set a pixel in the final image to the pixel in the subimage which has the smallest depth value, ie: is closest to the camera.
Part 1 | |
Part 2 | |
Part 3 | |
Composited image |
The fundamental problem with this technique is bandwidth. Consider rendering a 800 x 600 RGB model at 20 frames per second. There are 3 bytes for each image pixel and 4 bytes for each depth buffer pixel. So to transmit an image/depthbuffer pair from one machine to another requires (3 + 4) * 800 * 600 bytes or just over 3MB. For interactive performance at 20 frames per second this requires a bandwidth of 60MB per second which is clearly more than the capabilities of all but the very highest performance networks. To make matters worse even higher bandwidth is required the more OpenGL cards participate in the rendering although bandwidth bottlenecks can be reduced by arranging the OpenGL cards/machines in a tree structure and combining the image/depthbuffer pairs as the image pieces move up the tree.