Programming Assignment #1
IMAGES OF THE RUSSIAN EMPIRE:
Colorizing the Prokudin-Gorskii photo collection
Due Date: by 11:59pm, Th, Sept 16
Prokudin-Gorskii (1863-1944) was a man well ahead of
his time. Convinced, as early as 1907,
that color photography was the wave of the future, he won Tsar’s special
permission to travel across the vast Russian Empire and take color photographs
of everything he saw. And he really
photographed everything: people, buildings, landscapes, railroads, bridges…thousands of color pictures! His idea was
simple: record three exposures of every scene onto a glass plate using a red, a
green, and a blue filter. Never mind that there was no way to print color
photographs until much later – he envisioned special projectors to be
installed in “multimedia” classrooms all across
The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images such as this one and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and finally align them so that they form a single RGB color image. We will assume that a simple x,y translation model is sufficient for proper alignment. However, the full-size glass plate images are very large, so your alignment procedure will need to be relatively fast and efficient. The resulting color images will inevitably exhibit many artifacts due to color fading, blemishes on the glass plates, noise, etc. As extra credit, you can explore ways of remedying some of these problems automatically.
of the digitized glass plate images (both hi-res and low-res versions) will be
placed in the following directory (note that the filter order from top to bottom is BGR, not RGB!):
Your program will take a glass plate image as input and produce a single color image as output. The program should divide the image into three equal parts and align the second and the third parts (G and R) to the first (B). For each image, you will need to print the (x,y) displacement vector that was used to align the parts.
The easiest way to align the parts is to exhaustively search over a window of possible displacements (say [-15,15] pixels), score each one using some image matching metric, and take the displacement with the best score. There is a number of possible metrics that one could use to score how well the images match. The simplest one is just the L2 norm also known as the Sum of Squared Differences (SSD) distance which is simply sum((image1-image2).^2). Another is normalized correlation (discussed in class). Note that in this particular case, the images to be matched do not actually have the same brightness values (they are different color channels), so a cleverer metric might work better.
Exhaustive search will become prohibitively expensive if the pixel displacement is too large (which will be the case for high-resolution glass plate scans). In this case, you will need to implement a faster search procedure such as an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale and going down. It is very easy to implement by adding recursive calls to your original single-scale implementation.
The above directory has skeleton Matlab code that will help you get started.
BELLS & WISTLES
Although the color images resulting from this automatic procedure will often look strikingly real, they are still a far cry from the manually restored versions available on the LoC website and from other professional photographers (check out this wonderful site!). Of course, each such photograph takes days of painstaking Photoshop work, adjusting the color levels, removing the blemishes, adding contrast, etc. Can we make some of these adjustments automatically, without the human in the loop? In class, we will discuss a few possible things that can be done to improve the final images. Feel free to come up with your own approaches or talk to me about your ideas. There is no right answer here – just try out things and see what works.
The assignment is worth 100 points. You will get 60 points for a single scale implementation demonstrating successful results on the low resolution images. You will get 40 more points for a multiscale pyramid implementation, showing that you can handle larger input images (depending on the memory of your machine, you might still not be able to run on the full resolution images, in which case, show results on an intermediate resolution that you machine can handle). Up to 20 points of extra credit will be assigned for any Bells and Whistles (either suggested or your own).
WHAT TO TURN IN
You will need to create a web page showing the results of this assignment and describing any of the extras that you have done. Show your results on all images that were provided, plus a few others of your own choosing from the LoC collection. Additionally, you will need to hand in all of your code to a specified directory (not publicly readable). The TA will have more information about the appropriate directories for the web page and the code.