Previous | Next --- Slide 115 of 129
Back to Lecture Thumbnails

Review question: Understanding this slide will be very important throughout the course.

  • Is this a perfectly parallelizable application (yes/no)?
  • Why do I claim that it executing this application on a modern parallel computer (even one as small as your quad-core laptop) will result in very inefficient use of that parallel machine.
  • If multiplying two large arrays is the only operation you have to perform, is there any way you could change your code to gain better performance. (Assume that the baseline implementation has perfect prefetching.)
  • Yes, it is perfectly parallelizable.
  • This application is likely to be memory bandwidth limited because there is no reuse of data.
  • I can't think of a way to improve performance via code change (since we'd need to change the compute-memory ratio to get around the memory bandwidth bottleneck). Narrower data types?

Does it make sense to use a GPU to perform this computation? (Hint: the answer is certainly... "it depends")


The decision on whether or not to use the GPU to perform the computation would probably depend on whether this operation is bottlenecking your overall operation. For example, if we were running this operation many times in a row, it might be useful to have the 10X speedup over a long period of time. Alternatively, if our primary concern is the efficiency with which we are using our computing resources, we may not want to get that 10X speedup at the cost of such poor utilization.