Scanner: Efficient Video Analysis at Scale
Alex PomsWill CrichtonPat HanrahanKayvon Fatahalian
ACM Transactions on Graphics (2018)

A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field’s ability to explore new applications that leverage big video data. In response, we have created Scanner, a system for productive and efficient video analysis at scale. Scanner organizes video collections as tables in a data store optimized for sampling frames from compressed video, and executes pixel processing computations, expressed as data flow graphs, on these frames. Scanner schedules video analysis applications expressed using these abstractions onto heterogeneous throughput computing hardware, such as multi-core CPUs, GPUs, and media processing ASICs, for high-throughput pixel processing. We demonstrate the productivity of Scanner by authoring a variety of video processing applications including the synthesis of stereo VR video streams from multi-camera rigs, markerless 3D human pose reconstruction from video, and data-mining big video datasets such as hundreds of feature-length films or over 70,000 hours of TV news. These applications achieve near-expert performance on a single machine and scale efficiently to hundreds of machines, enabling formerly long-running big video data analysis tasks to be carried out in minutes to hours.

Alex Poms, Will Crichton, Pat Hanrahan, Kayvon Fatahalian (2018). Scanner: Efficient Video Analysis at Scale. ACM Transactions on Graphics, 37(4).

author = {Alex Poms, Will Crichton, Pat Hanrahan, Kayvon Fatahalian},
title = {Scanner: Efficient Video Analysis at Scale},
journal = {ACM Transactions on Graphics},
volume = {37},
number = {4},
year = {2018},
publisher = {ACM},
address = {New York, NY, USA}, }