ACM Transactions on Graphics, 35(4), July 2016
Proceedings of ACM SIGGRAPH 2016
The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a schedule), and the Halide compiler carries out the mechanical task of generating platform-specific code that implements the schedule. Unfortunately, designing high-performance schedules for complex image processing pipelines requires substantial knowledge of modern hardware architecture and code-optimization techniques. In this paper we provide an algorithm for automatically generating high-performance schedules for Halide programs. Our solution extends the function bounds analysis already present in the Halide compiler to automatically perform locality and parallelism-enhancing global program transformations typical of those employed by expert Halide developers. The algorithm does not require costly (and often impractical) auto-tuning, and, in seconds, generates schedules for a broad set of image processing benchmarks that are performance-competitive with, and often better than, schedules manually authored by expert Halide developers on server and mobile CPUs, as well as GPUs.
NOTICE: The algorithm described in this paper has been superceded by the techniques described in the SIGGRAPH 2019 autoscheduler paper. The autoscheduling functionality in the master Halide repo now uses the algorithm from the 2019 paper.
The autoscheduler has now been integrated into the master branch of Halide. Please see the Halide repository for autoscheduler source. The Halide site has a number of tutorials on how to use the autoscheduler.