This page contains lecture slides and recommended readings for the Fall 2016 offering of 15-769.

- The Compute Architecture of Intel Processor Graphics Gen9, Intel Technical Whitepaper

The required reading for this class is not an academic technical paper, but a whitepaper from Intel describing the architectural geometry of their latest GPU. This processor is particularly notable because it is the integrated GPU that will be in most mid-2016 and later Core i5 or i7 processors -- the marketing name is HD Graphics 530 (or larger).

I'd like you to read the whitepaper, focusing on the description of the processor in Sections 5.3-5.5. Then, given your knowledge of the concepts discussed in lecture (such as superscalar, multi-core, multi-threading, etc), I'd like you to describe the features of the processor (using terms from the lecture, not Intel terms).

Pro tip: Consider your favorite data-parallel language, such as GLSL/HLSL shading languages, CUDA, OpenCL, ISPC, or just an OpenMP #pragma parallel for, and make sure you can think through how an embarrassingly parallel for loop can be lowered to these architectures. (You don't need to write this down, but you could.)

Students wanting to go farther might also be interested in also reading the NVIDIA P100 or GTX 980 whitepapers also linked below. Then you could make a table contrasting the geometry of: a modern AVX-capable Intel CPU, Intel Integrated Graphics (Gen9), NVIDIA GPUs, and any other processor you might be interested in, such as Intel Knights Corner, AMD GPUs, etc.

- The point of this review lecture was to get you ready for 15-769. You may wish to review content from 15-418/618 Spring 2016 lecture 2, in particular lecture 5, and lecture 19. Lecture videos are online. Practice excercises 1 and 2 are applicable to this material, as well as many of the questions on the exam practice problems.
- I highly recommend that all students complete 15-418/618 Assignment 1 as an exercise if you have not taken 418 in ths past. This is a quick assignment that should only take an afternoon or two.
- NVIDIA Tesla P100 Whitepaper, 2016
- NVIDIA GeForce GTX 980 Whitepaper, 2014
- NVIDIA Tegra X1 Whitepaper
- The Rise of Mobile Visual Computing Systems, Fatahalian, IEEE Mobile Computing 2016
- Scalability! But at What COST?, McSherry, Isard, and Murray. HotOS 2015 (The arguments in this paper are very consistent with the way we think about performance in the visual computing domain.)

- The Stanford CS448A course notes are a very good reference for camera image processing pipeline algorithms and issues.
- The interactive demos on the Stanford CS178 course site are very well done (some were shown in class)
- Clarkvision.com has some very interesting material on cameras.

- The Frankencamera: An Experimental Platform for Computational Photography, Adams et al. SIGGRAPH 2010
- Burst Photography for High Dynamic Range and Low-light Imaging on Mobile Cameras, Hasinoff et al. SIGGRAPH Asia 2016

- Decoupling Algorithms from the Organization of Computation for High Performance Image Processing (please read Chapters 1, 4, 5, and 6.1), Ragan-Kelley (MIT Ph.D. thesis, 2014)
- Automatically Scheduling Halide Image Processing Pipelines, Mullapudi et al. SIGGRAPH 2016

- Halide Language Website (contains documentation and many tutorials)
- Image Perforation: Automatically Accelerating Image Pipelines by Intelligently Skipping Samples, Lou et al. Transactions on Graphics 2016

- Fast Median and Bilateral Filtering, Weiss. SIGGRAPH 2006
- A Non-Local Algorithm for Image Denoising, Buades et al. CVPR 2005
- A Gentle Introduction to Bilateral Filtering and its Applications, Paris et al. SIGGRAPH 2008 Course Notes
- Sylvain Paris' Fast Bilateral Filter page
- A Fast Approximation of the Bilateral Filter using a Signal Processing Approach, Paris and Durand. MIT technical report 2006 (extends their ECCV 2006 paper)
- An Iterative Image Registration Technique with an Application to Stereo Vision, Lucas and Kanade. IJCAI 1981
- Lucas-Kanade 20 Years On: A Unifying Framework, Baker and Matthews. ICCV 2004

- Rigel: Flexible Multi-Rate Image Processing Hardware, Hegarty et al. SIGGRAPH 2016. (You can learn more about Rigel on its Github project page. I would be interested in a student attempting to try out Rigel in a course project.)
- Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines, Hegarty et al. SIGGRAPH 2014
- Understanding Sources of Inefficiency in General-Purpose Chips, Hameed et al. ISCA 2010

- Stanford cs231: Convolutional Neural Networks for Visual Recognition. I recommend that you read through the lecture notes for modules 1 and 2 for very nice explanation of key topics.
- Neural Networks and Deep Learning, Nielson, 2016 (a free online book)
- Check out the TensorFlow tutorials and play around in the TensorFlow Playground
- Visualizing and Understanding Convolutional Neural Networks, Zeiler and Fergus, ECCV14
- ImageNet Classification with Deep Convolutional Neural Networks. Krizhevsky et al. NIPS 2012 (this is the original "AlexNet" paper)

- Going Deeper with Convolutions, Szegedy et al. CVPR 2015
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, Iandola et al. 2016

- Scaling Distributed Machine Learning with the Parameter Server, Li et al. OSDI 2014
- Project Adam: Building an Efficient and Scalable Deep Learning Training System, Chilimbi et al. OSDI 2014
- FireCaffe: Near-linear Acceleration of Deep Neural Network Training on Compute Clusters, Iandola et al. CVPR 2016

- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, Girshick et al. CVPR 2014 (the R-CNN paper)
- Fast R-CNN, Girshick, ICCV 2015
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren et al. NIPS 2015
- Deep Residual Learning for Image Recognition, He et al. CVPR 2016 (this is the ResNet paper)
- Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and A. Zisserman, ICLR 2015 (this is the VGG paper)

- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, Han et al. ICLR 2016

- Learning both Weights and Connections for Efficient Neural Networks, Han et al. NIPS 2015
- EIE: Efficient Inference Engine on Compressed Deep Neural Network, Han et al. ISCA 2016
- Clockwork Convnets for Video Semantic Segmentation, Shelhamer et al. ECCV16

- Variable Rate Image Compression with Recurrent Neural Networks, Toderici et al. ICLR 2016
- Full Resolution Image Compression with Recurrent Neural Networks, Toderici et al. 2016
- Cross-stitch Networks for Multi-task Learning, Misra et al. CVPR 2016
- Spatial Transformer Networks, Jaderberg et al. NIPS 2015
- Convolutional Pose Machines, Wei et al. CVPR 2016
- Neural Module Networks, Andreas et al. CVPR 2016

- Cambricon: an instruction set architecture for neural networks, Liu et al. ISCA 2016
- EIE: Efficient Inference Engine on Compressed Deep Neural Network, Han et al. ISCA 2016
- Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing, Albericio et al. ISCA 2016
- Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators, Reagen et al. ISCA 2016
- vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design, Rhu et al. MICRO 2016
- Fused-Layer CNN Architectures, Alwani et al. MICRO 2016
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Network, Chen et al. ISCA 2016
- PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory, Chi et al. ISCA 2016
- DNNWEAVER: From High-Level Deep Network Models to FPGA Acceleration, Sharma et al. MICRO 2016

- Modeling the World from Internet Photo Collections. N. Snavely et al. IJCV 2007
- Photo Tourism: Exploring Photo Collections in 3D. N. Snavely et al. SIGGRAPH 2006
- Building Rome in a Day. S. Agarwal et al. ICCV 2009
- Building Rome on a Cloudless Day. J. Frahm et al. ECCV 10
- Skeletal Graphs for Efficient Structure from Motion. N. Snavely et al. CVPR 2008
- Reconstructing the World in Six Days, Heinly et al. CVPR 2015

- Using Very Deep Autoencoders for Content-Based Image Retrieval, by Krizhevsky and Hinton
- Location Recognition using Prioritized Feature Matching, Li et al. ECCV10
- Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration, Muja and Lowe
- Deep Image Retrieval: Learning Global Representations for Image Search, Gordo et al. ECCV 2016
- Fast Search in Hamming Space with Multi-Index Hashing, Norouzi et al.
- Content-Based Video Search over 1 Million Videos with 1 Core in 1 Second, Yu et al. ICMR 2015

- Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging, DeVito et al. 2016

- KinectFusion: Real-Time Dense Surface Mapping and Tracking, Newcombe et al. ISMAR 2011
- Real-time 3D Reconstruction at Scale using Voxel Hashing, Nießner et al., TOG 2013
- BundleFusion: Real-time Globally Consistent 3D Reconstruction using Online Surface Re-integration, Dai et al. 2016
- DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time, Newcombe et al. CVPR 2015 (extending KinectFusion ideas to moving scenes)

- The Design of the OpenGL Graphics Interface. by M. Segal and K. Akeley. [unpublished 1994]
- The Direct3D 10 System by D. Blythe. SIGGRAPH 2006
- Real-Time Rendering (Third Edition, chapters 2 and 3), by T. Akenine-Moller, E. Haines, and N. Hoffman

- Rasterization on Larrabee. M. Abrash. Dr. Dobbs Portal, May 1, 2009 (original article is available online here)
- Triangle Scan Conversation Using 2D Homogeneous Coordinates. M. Olano and T. Greer. Graphics Hardware 1997
- High-Performance Software Rasterization on GPUs. S. Laine et al. High Performance Graphics 2011. (source code is available on the paper page)
- Hierarchical Z-Buffer Visibility. N. Greene et al. SIGGRAPH 1993
- Efficient Depth Buffer Compression. J. Hasselgren and T. Akenine Möller.
- Stochastic Depth Buffer Compression using Generalized Plane Encoding. M. Andersson et al. Computer Graphics Forum 2013
- The Irregular Z-Buffer: Hardware Acceleration for Irregular Data Structures. G. Johnson et al. Transactions on Graphics, 2005
- Data-Parallel Rasterization of Micropolygons with Defocus and Motion Blur. K. Fatahalian et al. High Performance Graphics 2009
- Clipless Dual-Space Bounds for Faster Stochastic Rasterization. S. Laine et al. SIGGRAPH 2011
- Interpolation for Polygon Texture Mapping and Shading. P. Heckbert and H. Moreton, State of the Art in Computer Graphics: Visualization and Modeling. 1991

- Pyramidal Parametrics. L. Williams, Computer Graphics 1983
- Texture on Demand. D. Peachy. Pixar Technical Memory #217. 1990
- The Design and Analysis of a Cache Architecture for Texture Mapping. Z. S. Hakura and Anoop Gupta, ISCA 1997
- Prefetching in a Texture Cache Architecture. H. Igehy et al. Graphics Hardware 1998
- Cardinality-Constrained Texture Filtering. J. Manson and S. Schaefer. SIGGRAPH 2013.
- Parameterization-Aware MIP-Mapping. J. Manson and S. Schaefer. Computer Graphics Forum. 2012.

- Texture Compression using Low-Frequency Signal Modulation. S. Fenney. Graphics Hardware 2003
- iPACKMAN: High-Quality, Low-Complexity Texture Compression for Mobile Phones. J. Ström and T. Akenine-Möller. Graphics Hardware 2005
- ETC2: Texture Compression using Invalid Combinations. J. Ström and M. Pettersson. Graphics Hardware 2007
- Adaptive Scalable Texture Compression. T. Olson et al. High Performance Graphics 2012
- Block Compression in Direct3D 10. MSDN Developer Reference. 2013

- Pomegranate: A Fully Scalable Graphics Architecture. M. Eldridge et al. SIGGRAPH 2000

- Life of a Triangle - NVIDIA's Logical Pipeline. C. Kubisch (NVIDIA GameWorks Blog, 2015)
- Fast Tessellated Rendering on Fermi GF100. T. Purcell (High Performance Graphics Hot3D talk)
- A Sorting Classification of Parallel Rendering. S. Molnar et al. IEEE Computer Graphics and Applications, 1994.

- Morphological Anti-Aliasing. A. Reshetov. High Performance Graphics 2009

- A Sort-based Deferred Shading Architecture for Decoupled Sampling. P. Clarberg. SIGGRAPH 2013
- Deferred Rendering for Current and Future Rendering Pipelines. A. Lauritzen. SIGGRAPH Beyond Programmable Shading Course 2010
- Intersecting Lights with Pixels: Reasoning about Forward and Deferred Rendering. A. Lauritzen. SIGGRAPH Beyond Programmable Shading Course 2012
- Mythic Science Fiction in Real-Time: The Destiny Rendering Engine. Natalya Tatarchunk, Advances in Real-Time Rendering in Games, SIGGRAPH 2013 Course Notes
- SMAA: Enhanced Subpixel Morphological Antialiasing. J. Jimenez et al. Eurographics 2012
- Filtering Approaches for Real-Time Anti-Aliasing. SIGGRAPH 2011 Course Notes

- Understanding the Efficiency of Ray Traversal on GPUs. T. Aila and S. Laine, High Performance Graphics 2009
- Architecture Considerations for Tracing Incoherent Rays. T. Aila and T. Karras, High Performance Graphics 2010
- Megakernels Considered Harmful: Wavefront Path Tracing on GPUs. S. Laine, T. Karras and T. Aila, High Performance Graphics 2013
- An Energy and Bandwidth Efficient Ray Tracing Architecture. D. Kopta et al. High Performance Graphics 2013
- Fast Parallel Construction of High-Quality Bounding Volume Hierarchies. T. Karras et al. High Performance Graphics 2013
- Efficient BVH Construction via Approximate Agglomerative Clustering. Y. Gu et al. High Performance Graphics 2013
- Combining Single and Packet Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture. C. Benthin et al. IEEE Transactions on Visualization and Computer Graphics 2011
- SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing. W. Lee et al. High Performance Graphics 2013
- T&I engine: traversal and intersection engine for hardware accelerated ray tracing. J. Nah et al. SIGGRAPH Asia 2011
- OptiX: A General Purpose Ray Tracing Engine. S. Parker et al. SIGGRAPH 2010
- Ray Tracing for the Movie Cars. P. Christensen. Symposium on Interactive Ray Tracing 2007
- Rendering Complex Scenes With Memory-Coherent Ray Tracing. M. Pharr et al. SIGGRAPH 1997
- PBRT: Physically-Based Ray Tracing: From Theory to Implementation, 2nd Edition. M. Pharr and G. Humphreys.

- A Language for Shading and Lighting Calculations. P. Hanrahan and J. Lawson. SIGGRAPH 1990
- Cg: A System for Programming Graphics Hardware in a C-like Language. W. R. Mark et al. SIGGRAPH 2003

- Spark: Modular, Composable Shaders for Graphics Hardware. T. Foley and P. Hanrahan. SIGGRAPH 2011
- A System for Rapid Exploration of Shader Optimization Choices. Y. He et al. SIGGRAPH 2016
- A Real-Time Procedural Shading System for Programmable Graphics Hardware. K. Proudfoot et al. SIGGRAPH 2001
- Shade Trees. R. Cook. SIGGRAPH 1984
- An Image Synthesizer. K. Perlin. SIGGRAPH 1985
- Shader Metaprogramming. M. McCool et al. Graphics Hardware 2002

- Ebb: A DSL for Physical Simulation on CPUs and GPUs. G. Bernstein et al. TOG 2016 (also see the Ebb language page)

- Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. Z DeVito et al. Supercomputing 2011 (also see the Lizst language page)
- Simit: A Language for Physical Simulation. F. Kjolstad et al. TOG 2016 (also see the Simit language page)
- Why New Programming Languages for Simulation?. G. Bernstein and F. Kjolstad, TOG 2016

- TensorFlow: A System for Large-Scale Machine Learning. M. Abadi et al. OSDI 2016

- TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. M. Abadi et al. (TensorFlow whitepaper from 2015)
- NNVM Documentation (NxNet)
- XLA Documentation (Google)
- Nervana Graph API Documentation