Programming Project #2 for the 15-862 class (`proj2g`)
15-463: Computational Photography

Eulerian Video Magnification

Image from http://people.csail.mit.edu/mrub/vidmag/

Due Date: 11:59pm on Tuesday, September 18, 2012

Overview

This project explores Eulerian Video Magnification to reveal temporal image variations that tend to be difficult to visualize. This video briefly describes the technique, and shows a variety of visual effects attainable with it. For example, one can amplify blood flow and low-amplitude motion with Eulerian Magnification without needing to perform image segmentation or computing optical flow.

The primary goal of this assignment is to amplify temporal color variations in two videos (face and baby2), and amplify color or slow-amplitude motion in your own custom short video sequence that you record yourself. For this you will read the paper about Eulerian Video Magnification and implement the approach.

How can you amplify image variations that are hard to see with the naked eye? The insight is that some of these hard-to-see changes occur at particular temporal frequencies that we can augment using simple filters in the frequency domain! For example, to magnify pulse we can look at pixel variations with frequencies between 0.4 and 4Hz, which correspond to 24 to 240 beats per minute.

The Eulerian magnification process is straight forward:
(1) An image sequence is decomposed into different spatial frequency bands using Laplacian pyramids
(2) The time series corresponding to the value of a pixel on all levels of the pyramid are band-pass filtered to extract frequency bands of interest
(3) The extracted band-passed signals are multiplied by a magnification factor and this result is added to the original signals
(4) The magnified signals that compose the spatial pyramid are collapsed to obtain the final output
If the input video has multiple channels (e.g., each frame is a color image in RGB color space), then we can process each channel independently. The YIQ color space is particularly suggested for Eulerian magnification since it allows to easily amplify intensity and chromaticity independently of each other (we can use rgb2ntsc and ntsc2rgb to move between RGB and YIQ in MATLAB).

The hard part of the process is to find the right parameters to get the desired magnification effect. For example, one can change the size of the Laplacian pyramid, multiply the time series corresponding to the value of a pixel by different scale factors at different levels of the pyramid, or attenuate the magnification when adding the augmented band-passed signals to the original ones. The choice of the band-pass filter (e.g., the range of frequencies it passes/rejects, its order, etc.) can also influence the obtained results. This exploration is part of the project, so you should start early!

Laplacian pyramid (20pts)

The first step to augment a video is to compute a Laplacian pyramid for every single frame (see Szeliski's book, section 3.5.3). The Laplacian pyramid was originally proposed by Burt and Adelson in their 1983 paper The Laplacian pyramid as a compact image code, where they suggested to sample the image with Laplacian operators of many scales. This pyramid is constructed by taking the difference between adjacent levels of a Gaussian pyramid, and approximates the second derivative of the image, highlighting regions of rapid intensity change.

Image from http://www.cse.yorku.ca/~sizints/

Each level of the Laplacian pyramid will have different spatial frequency information, as shown in the picture above. Notice that we need to upsample one of the images when computing the difference between adjacent levels of a Gaussian pyramid, since one will have a size of wxh, while the other will have (w/2)x(h/2) pixels. Since the last image in the Gaussian pyramid does not contain an adjacent image to perform the subtraction, then it just becomes the last level of the Laplacian pyramid.

Notice that by doing the inverse process of constructing a Laplacian pyramid we can reconstruct the original image. In other words, by upsampling and adding levels of the Laplacian pyramid we can generate the full-size picture. This reconstruction is necessary to augment videos using the Eulerian approach.

Temporal filtering (40pts)

We consider the time series corresponding to the value of a pixel on all spatial levels of the Laplacian pyramid. We convert this time series to the frequency domain using the Fast Fourier Transform (fft in MATLAB), and apply a band pass filter to this signal. The choice of the band-pass filter is crucial, and we recommend designing and visualizing the filter with fdatool and fvtool in MATLAB (see an example by MathWorks).

To make this process easier, we provide you with a butterworthBandpassFilter function to generate a Butterworth band-pass filter of a particular order. This function was generated with fdatool, and is optional for you to use. You can download the file, or use the code below for reference:

% Hd = butterworthBandpassFilter(Fs, N, Fc1, Fc2) % Fs - sampling frequency (e.g., 30Hz) % N - filter order (must be an even number) % Fc1 - first cut frequency % Fc2 - second cut frequency % Hd - approximate ideal bandpass filter function Hd = butterworthBandpassFilter(Fs, N, Fc1, Fc2) h = fdesign.bandpass('N,F3dB1,F3dB2', N, Fc1, Fc2, Fs); Hd = design(h, 'butter'); end

More details on the fdesign.bandpass parameters can be found here. Check the Eulerian Video Magnification paper for details on the parameters they used on the face and baby2 videos. You will have to find the right parameters for the other video that you capture yourself, and process.

In order to filter the time series of the pixels fast, we recommend you perform this operation in the frequency domain, since multiplication is faster than convolution. But be careful about fft's output format when doing this! As explained in this tutorial, the DC component of fftx = fft(x), for x a 1D signal, is the first element fftx(1) of the array. If x has an even number of samples, then the magnitude of the FFT will be symmetric, such that the first (1+nfft/2) points are unique, and the rest are symmetrically redundant. The element fftx(1+nfft/2) is the Nyquist frequency component of x in this case. If the number of samples of x is odd, however, the Nyquist frequency component is not evaluated, and the number of unique points is (nfft+1)/2.

Also, if you decide to use the butterworthBandpassFilter function, then you will need to get the frequency components of the filter for fast computation. This can be done by using MATLAB's freqz function, by passing the filter and the length of the output that you want (i.e., fftHd = freqz(Hd,NumSamples)). Again, be careful about how the frequency components are output by freqz.

Pixel change magnification (10pts)

After extracting the frequency band of interest, we need to amplify it and add the result back to the original signal.

Image reconstruction (30pts, which includes evaluation of your results)

After amplifying the signals, all that is left is to collapse the Laplacian pyramids into a single image per frame. Notice that we can attenuate the amplification to obtain different resuts, or we can low-pass filter the amplified signal to reduce effects on high frequency components of the images, such as borders. Two different blood flow amplification effects on the face.mp4 image sequence are presented below,

Bells & Whistles (Extra Credit)

Try some special moves to increase your score:

(5pts) Pick a time series corresponding to the value of a pixel on any pyramid level, and make a plot of power vs frequency after converting to the frequency domain (you can follow these steps for plotting). Show the plot in your project website and indicate the pyramid level of the pixel you chose, and its (x,y) location.
(5pts) Using the same time series as above, make a plot of power vs frequency after band-pass filtering the signal in the frequency domain. Show the plot in your project website and indicate the pass band of the filter that you used.
(15pts) Try two additional set of parameters on each of the processed videos (face, baby2, and the one you captured), and show the different results you obtained. Explain what is being augmented in each case (e.g., color variation or slow-amplitude motion).
(10pts) Try to augment low-amplitude motion specifically in an another video provided by the authors of the Eulerian Video Magnification paper (see the Data section of the project website). Do not pick the face or baby2 video for this purpose, nor any other video you amplified before.

Deliverables

Use both words and images to show us what you've done (describe in detail your algorithm parameterization for each of your results).
Place all code in your code/ directory. Include a README describing the contents of each file.
In the website in your www/ directory, please:

Include a brief description of the project, and explain how you constructed the Laplacian image pyramid.
Show your result for amplifying blood flow on the face and baby2 sequences, as well as your results for the additional sequence that you capture and process. Your results should mimic the results obtained by the authors on the face and baby2 sequences (they should amplify similar pulsation rates, though colorizing may be different). Upload your videos to YouTube, then embed them into your project web-page, and indicate the parameters you used in each case.
Explain any difficulties and possible reasons for bad results.
Include any bells & whistles and explain what parameters you used.

Programming Project #2 for the 15-862 class (proj2g) 15-463: Computational Photography