Final Project Instructions

The 15-869 final project is an opportunity for you (with up to one partner) to perform an in-depth study of your choosing of a visual computing topic. Including the proposal period, you have about seven weeks to complete the project. Expectations for the project are quite high: Graduate students in the class should aim to take on a problem and question for which the solution is not directly found in a book or recent literature. You are welcome and encouraged to choose a project that is harmonious with your own research. Undergrads need not meet this novelty bar, but should aim high.


  • Thursday Oct 30 -- project proposal deadline (please come talk to Kayvon to vet ideas prior to this deadline)
  • Nov 14 -- Checkpoint 1
  • Dec 1 -- Checkpoint 2
  • Fri Dec 12 -- Project Presentations (1-4pm)
  • Sun Dec 14 -- Final Writeups Due


  • You are free to choose any project you wish, provided you can convince me that it embodies or utilized themse from the course. Consider what those themes are:
    • The design of good abstractions. One reason that graphics has managed to build very effective systems is that we managed to identify the right abstractions for our needs. Those abstractions present a consistent and intuitive interface for application developers to express computations, and they present a stable interface beneath with system implementations can be heavily optimized. Choosing the right level of abstraction is a key theme in this course, and taking on an API or framework design challenge in your project would be very appropriate. (the fields of computational photography and computer vision are dealing with these challenges now).
    • Efficient scheduling. Mapping computations to modern machines is an act of scheduling. Managing the conflicting goals of work efficiency, parallel execution, and maximizing locality was pervasive in the course. Most in-depth optimization tasks will be in scope.
    • Deep workload understanding and analysis. Throughout the course we evaluated design decisions through the lens of a specific class of workloads. Peforming a deep dive into a workload that is important to you could be a good project.
    • Energy and efficiency. Energy and efficiency. Energy and efficiency.
  • Talk with Kayvon early and often. While I won't be able to come up with project ideas for everyone in the class, if you come to me with basic ideas, I will help guide you towards a fun and challenging project.

Proposal Requirements

Please create a web page for your project and email me the url. Your project proposal page should contain the following sections:

SUMMARY. Summarize your project in a just a few sentences. Describe what you plan to do (e.g., build a system, answer a question?) and what you hope your final result will be.

BACKGROUND. Describe what the problem you are trying to solve is. This may include a very basic related work section.

THE CHALLENGE. Describe why the problem is challenging. In other words, why don't you know the answer off the top of your head? Why can't you just look up the answer in a book? What do to you hope to learn by doing the project that you don't know now?

RESOURCES. Describe the resources (computers, starting code, etc.) you will use. What codebase will you start from? Are you starting from scratch or using an existing piece of code? Is there a book or paper that you are using as a reference? Are there any other resources you need, but haven't figured out how to obtain yet? Could you benefit from access to any special machines?

GOALS/DELIVERABLES. Describe the deliverables or goals of your project.

  • Separate your goals into what you PLAN TO ACHIEVE (what you believe you must get done to have a successful project and get the grade you expect) and an extra goal or two that you HOPE TO ACHIEVE if the project goes really well and you get ahead of schedule. It may not be possible to state precise performance goals at this time, but we encourage you be as precise as possible. If you do state a goal, give some justification of why you think you can achieve it. (e.g., I hope to speed up my starter code 10x, because if I did it would run in real-time)
  • If applicable, describe the demo you plan to show at your final presentation (will it be an interactive demo, will you show an output of the program that is really neat, will you show speedup graphs reaching a certain performance level?)
  • If your project is an analysis project, what are you hoping to learn about the workload or system being studied? What question(s) do you plan to answer?
  • Systems project proposals should describe what the system will be capable of and what performance is hoped to be achieved.

SCHEDULE. Produce a draft schedule for your project. Your schedule should have at least one item to do per week. List what you plan to get done each week from now until finals week in order to meet your project goals. In your schedule we encourage you to be as precise as possible. It's often helpful to work backward from your deliverables and goals, writing down all the little things you'll need to do (establish the dependencies!).

Final Presentation and Writeup Requirements


Your project presentation will be a 10-12-minute presentation during the exam slot on Friday Dec 12th. (Each project will be limited to a total of 15 minutes, including questions). Your presentation should be targeted at your fellow classmates, who may not be familiar with the problem you are working on or your algorithmic approach. Since time is short, and since the final five minutes of your presentation should be focused on evaluation and results, you will have to make good choices about what include in the presentation. I suggest:

  • 1 minute: introduction. What is your goal in this project (write an efficient distributed ray tracer, automatically choose between deferred and non-deferred rendering each frame, determine if a CPU or GPU is a better platform for the headline processing pipeline, performance tune the mid-level patch pipeline and analyze its performance on bigger datasets)
  • 1 minute: why is this project "hard"? (What the primary challenge you are trying to overcome, or the question you are trying to answer?)
  • 4 minutes: approach. This should give the basics of your approach, which might be a pipeline diagram, a piece of [as-simple-as-possible] algorithm pseudocode (if an algorithm was invented or tweaked as part of the program), a description of the primitives added to a language, a diagram of a parallelization strategy.
  • 5 minutes: Results. You should quickly describe your experimental setup (I ran THIS code on THESE examples. And if you examples are scenes, show images of them so we know what they are). Then provide results showing how well you've done toward the goal (sped up pipeline by factor of XXX, produced these images that were compressed by a factor of XXX) These results images, videos, graphs, or figures can highlight but successes AND failure cases of the project. (Showing that it works in cases A, B, and C, but fails in cases D, and E shows you've thought a bit about how you've done.)

Overall principles for a good talk:

  • Keep in mind Kayvon's talk tips, particular those given here. Note, students that follow these tips seem to always give remarkably better talks.
  • On Thursday night, practice the talk out loud at least once, hopefully twice. (This is not hard to do since it is short.) It is important that you finish the talk in no more than 12 minutes, and without haven't practiced in real time, you will be surprised how long your talk is when you give it in class.
  • We love images, videos, and live demos. It is much easier to make a point clearly and quickly by just showing example inputs and outputs. ("For example, here are some images from the test set...", "here are example renderered outputs...", "here are failure cases...")


In contrast to the presentation, the write-up should be written for the eyes of the course staff for grading. Specifically, we want to know how your algorithm / system works (at the level of being able to implement it outselves from the description) and we want to see a results section that shows that you have taken the time to understand what your implementation does and does not do well. Your writeup should be as short as possible, provided it meets the above two goals:

The following is a basic outline, but this outline need not be followed, and the examples included below are not applicable to all projects. You can hand in a pdf, or simple update your project web pages.

TITLE/AUTHORS. Give your project a cool title and please make sure author names + andrew ids are at the top.

SUMMARY. A short (no more than 2-3 sentances) project summary that includes a mention of the result.

  • Example: We developed a new randomized algorithm for all pairs NN-search in video streams based on PatchMatch, and showed our algorithm can converge to produce high quality NN lists (without 90% of reference) in only 5% of the execution time.
  • Example: We parallelized a ray tracer across multiple machines in Gates, assuming the scene is too big to fit in DRAM in any one machine. Our 6 machine (36 core) configuration achieves a 10x speedup compared to a single node.
  • Example: We ported the CMU headlight pipeline to Darkroom language, analyzed the performance of the implementation compared to a hand-tuned implementation and have identified the primary reasons for the performance differences to be A and B.

BACKGROUND/APPROACH. Describe the algorithm, application, or system implementation. This should be a brief as possible, but at the level of detail were the course staff (or a fellow 869 classmate) could reason about how to implement your design from your description. Figure(s) would be really useful here.

  • What are the key data structures?
  • What are the key operations on these data structures?
  • What are the algorithm's inputs and outputs?
  • What is the part that computationally expensive and could benefit from parallelization?
  • Break down the workload. Where are the dependencies in the program? How much parallelism is there? Is it data-parallel?
  • Where is the locality? Is it amenable to SIMD execution?
  • Describe the technologies used. What language/APIs? What machines did you target?
  • Describe how you mapped the problem to your target parallel machine(s).
  • If your project involved many iterations of program optimization, please describe this process as well. What did you try that did not work? How did you arrive at your solution? The notes you've been writing throughout your project should be helpful here. Convince us you worked hard to arrive at a good solution.
  • If you started with an existing piece of code, please mention it (and where it came from) here.

RESULTS. How successful were you at achieving your goals? How can you show that with results images/videos or quantitatively using graphs?

  • If your project was optimizing an algorithm, please define how you measured performance. Is it wall-clock time? Speedup? An application specific rate? (e.g., moves per second, images/sec)
  • Please also describe your experimental setup. What were the size of the inputs? How were requests generated? What were the test scenes?
  • Provide graphs of speedup or execute time. Please precisely define the configurations being compared. Is your baseline single-threaded CPU code? It is an optimized parallel implementation for a single CPU?
  • IMPORTANT: What limited your speedup? Is it a lack of parallelism? (dependencies) Communication or synchronization overhead? Data transfer (memory-bound or bus transfer bound). Poor SIMD utilization due to divergence? As you try and answer these questions, we strongly prefer that you provide data and measurements to support your conclusions. If you are merely speculating, please state this explicitly. Performing a solid analysis of your implementation is a good way to pick up credit even if your optimization efforts did not yield the performance you were hoping for.
  • Deeper analysis: Can you break execution time of your algorithm into a number of distinct components. What percentage of time is spent in each region? Where is there room to improve?
  • Was your choice of machine target sound? (If you chose a GPU, would a CPU have been a better choice? Or vice versa.)

REFERENCES. Please provide a list of references used in the project.

Project Ideas

While I would be most excited if you came up with project ideas completely on your own, the following are a few ideas to get you thinking about projects. I am open to any project idea provided you can convince me it will be a valuable experience, is a challenging body of work, and you can tie the main challenges of the project to topics in the course.

Remember the platforms I have sitting in my office: Oculus DK2, NVIDIA Shields, Several XBox One sensor depth cams, are all available to you.

    • Undergrads only: Implement advanced graphics algorithms (rendering, simulation) on a GPU or multi-core CPU and analyze their performance in detail. (you could extend the assignment 1 renderer to do this. See many of the supplemental readings on the course readings page.). You might be able to implement these techniques on top of Yong's graphics code generation system. (talk to Yong, it may not be ready for use.)
    • Propose an improvement to graphics pipeline algorithms or pipeline scheduling algorithms discussed in class and evaluate the performance of your solution via HW simulation or on a real machine
    • Design a heuristic for determining which pipeline parallelization strategy (or which rasterization algorithm, or whether or not to use deferred shading) is the most efficient answer for a given workload. Can you make the system figure this out on its own?
    • Evaluate a shading technique using the Oculus DK2.
    • See if you can combine Yong's SIGGRAPH 2014 paper on multi-rate fragment shading with deferred shading. (Can you do multi-rate shading in a deferred rendering pipeline?)
    • Build a "cluster" renderer that distributes a real-time rendering workload across multiple machines.
    • Consider temporal reprojection as a first-class concept in a shading language. How might it get integrated.
    • Read about the idea of Self-Refining Games and consider if you could make a self-refining game by recording your own video (instead of performing massive simulation)!
    • Explore the idea of efficiently rendering a forest of high resolution trees. (with individual leaves!)
    • Can you think of novel ways to use large image collections to synthesize new images? Like this paper on scene completion or this one on adding detail to renderings or infinite images! (these are all image analysis projects as much as they are rendering projects)
    • Modify the Halide compiler to add a new language feature.
    • The "inverse Facebook API": design an API and server-side runtime system for uploading Halide code to a server and running the code on a large collection of images that match a search predicate. (deploy it on AWS)
    • Build a system for finding all pairs of similar images in a database of many hours of video. Kayvon has an idea for a new algorithm for doing this, and if you're really serious about hacking fast code and doing a good project that might extend beyond the course come talk to me. (But read PatchMatch first.)
    • Explore the latest work on "predicting objectness" and implement the latest techniques and report on their efficiency. (See this page for a good place to start.)
    • Can you develop a very high quality implementation of this paper on high dimensional image search. (High dimensional search is useful in image retrieval, for example see this paper on converting images to bitcodes.
    • Build a power measurement device that allows you to measure the power consumption of a graphics or image processing workload on real GPU or SoC hardware.
    • Hack a Philips HUE light bulb (RGB LED bulb) to build a system that responds to command with low latency.
    • Study the utility of HW-data-compression techniques for image processing applications.
    • Design your own programming framework/API for an application domain of your choosing
    • Design a programming API and/or GUI for making 869 or 418-style technical figures.