Removing tourists from photographs

Using averaging and the geometric median

Written by Paul Bourke
November 2024, updated February 2025

It isn't unusual for a site one would like to take a photograph of to be constantly full of people (eg: tourists) milling around. The likelihood of getting a photograph with no one in the shot is slim.

One option is to record a video and then average the frames together. The video obviously needs to be capturing from an absolutely still position, for example, on a tripod. The averaging of the frames will have no bearing on the static structures but moving objects will become less bright as they get averaged with the static structures. The following is a 1 second (30 frames) average around the time of the image above.

The longer the averaging the more effectively moving people will be removed. And finally, a 4 minute average over all the frames in the video is presented below. The technique relies on movement, so the people on the front pews remain visible since they didn't move. The slight blur in the center towards the front is the result of someone standing for some time taking photographs of the building.

One can also take a rolling average, this is a 1 second sliding window and titled "chem trails".

Geometric Median

Employing the average tends to fail if the occupancy of moving objects is too high. An alternative is to use the geometric median, this was proposed to the author by Matthew Thomas.

In the following test, there are people who linger in one spot for many minutes. The results shown here are based upon a video recording lasting 20 minutes. The site is the India Gate in Mumbai. A typical frame from the video is shown below.

Averaging gives the following.

Whereas the geometric median results in the following.

The geometric median is problematic to calculate for a large number of images because it requires the median to be calculated for each pixel across the image set. Typically all the images cannot be held in memory at once. To facilitate this the Weiszfeld's iterative algorithm has been used, this is a series that converges to the geometric mean, the rgb values considered to be the 3D space. The series is given by the following where M_k(r,g,b) is the series of RGB values and C_i(r,g,b) is the colour of a pixel in the i'th image. This series is calculated for each pixel across the n images.