Lecture 1 : Course Intro + The Real-Time Graphics Pipeline
Download as PDF


If interested on how Chimera works, here is its whitepaper.

According to this paper, Chimera architecture essentially pipelines the stages where ISP/CPU/GPU modifies data obtained through the sensor. This reduces latency between consecutive reads from the CMOS sensor. However, there might be work around for the HDR problem Chimera is trying to solve using other SoC's. One way is to buffer as many pictures as possible before processing them in batch. This should reduce latency between pictures. Of course, this would require a big buffer...
This brings me a question: Does the Chimera architecture use a large (at least larger than normal) cache that is shared between ISP/CPU/GPU?


The vertex shader is a side-effect-free function that takes as input a struct describing one vertex from the input vertex stream as well as (optional) read-only "uniform" input data, and computes as output a struct representing the modified vertex.

Here that uniform input data is a 4x4 matrix used to transform the world-space position of the vertex into a canonical normalized clip space position that expected by the rasterization stage of the pipeline. (Further details about coordinate spaces will come later in the course). The same matrix is used to transform all vertices in the input stream.

Note that you should think about a vertex as a struct (a record), and not simply a XYZ coordinate. All vertices need to have a position field, but the vertex record can certain contain many other fields, such as surface normal, surface color, texture coordinates, etc.


Question: The purpose of the Z-buffer is to hold the closest visible surface at each output image pixel (surfaces behind this distance are not visible (aka "occluded" from view). Why do you think all modern GPUs use a Z-buffer to compute occlusion? Does this seem like a very brute-force approach since a comparison is performed per-pixel, and not per primitive? Why not just sort all triangles up front and draw them in back-to-front order?


I can think of two possible reasons for favoring the per-fragment z-buffer approach.

  1. It's an easy algorithm! The Z-buffer algorithm is very simple in comparison to the obvious per-triangle algorithm (check whether two triangles intersect, potentially split the triangles if they pass through one another, as in the image above, etc.). For example, it is not obvious how to represent the depth if the triangle is not perpendicular to the camera - would we have a range of depths for triangles, as opposed to a single number; how would these depths be sorted? If the algorithm is simple to implement then it is feasible that it will be reasonable to implement in hardware and moreover it should hopefully use up a relatively small amount of die area.

  2. How many triangles vs. fragments? It isn't necessarily true now and in the future that there will be more fragments than triangles (1). As triangles get smaller the potential benefit of doing per-triangle culling decreases.

(1) http://www.cs.cmu.edu/afs/cs/academic/class/15418-s12/www/lectures/25_micropolygons.pdf


@gif: Good answer. I'd say the primary reasons are simplicity and generality. By general, I mean any primitive that can be turned into fragments can be composited with proper occlusions into the same frame buffer. Imagine if you implemented a fancy scene for determining occlusion for triangles, and then I asked you to add lines, or spheres to your system. Your algorithm would have to change.

The Z-buffer reduces the problem of occlusion to one of generating fragments. Since occlusion tests are always the same (even if supported primitive types change), and the operation is very simple, a hardware accelerated Z-buffer is reasonable brute force, but very fast solution.


Even though I tend to say "OpenGL" a lot in class, really the Direct3D 11 (D3D11) and OpenGL 4 graphics pipelines (illustrated by the pipeline on the right) have essentially the same structure and the same basic abstractions. For this reason I'll try to use the more generic term "real-time graphics pipeline" to refer to this general abstract machine.

Clearly, the software API presented to the application is different in the two cases (Direct3D 11 is a Windows proprietary runtime and OpenGL is an open standard).