Review questions: This slide adds superscalar execution capabilities to the fake processor.
Superscalar execution means using multiple execution units to execute multiple instructions at the same time.
ILP is an abstraction. It measures how many instructions can be executed by the computer at the same time. It does not say how you would actually achieve this. Besides superscalar other ways to potentially achieve this would be pipelined execution or branch prediction. So Superscalar execution is a way to implement this abstraction.
I like @legolas' post. It's completely on the right track (thank you @legolas), but let's refine it a bit to get it to a point where its 100% true. Here are some comments:
Let's first talk about the above statement "ILP is an abstraction". I'd contend that ILP is not an abstraction but a property of a portion of a program. Agree or disagree?
Second, are we sure that pipelined execution and branch prediction achieve the same effect as superscalar execution? To give you a sense of what I'm thinking, consider the following. Superscalar execution is the act of using multiple ALUs to process multiple independent instructions in parallel. For example, if there are two ALUs, then perhaps throughput could be increased to up to two instructions per clock. So by doubling the execution resources, we've doubled the throughput. Now contrast this to a pipelined architecture?
Last, consider the difference between double execution resources in a superscalar design to adding a branch predictor, where the processor's execution capability is not changed at all.
I agree with @kayvonf that ILP is a property of the instruction stream which measures how much potential there is for instructions to be executed in parallel. Superscalar execution then takes advantage of the ILP in an instruction stream by actually executing the instructions in parallel.
The difference between pipelined execution and superscalar execution is that pipelines are trying to hide latency in instruction execution, whereas superscalar execution directly increases the throughput of the system by adding additional execution resources. A perfectly pipelined system will hide all instruction latency, giving the system a throughput of one instruction per second. Better pipeline architectures can get closer and closer to this limit, but you can't make performance arbitrarily good just by improving the pipeline. On the other hand, superscalar execution is only limited by the ILP of the program. On a stream of independent instructions, you can make a superscalar architecture as fast as you want by adding more execution units.
Adding execution resources in a superscalar system improves performance on ideal workloads with lots of ILP. Adding a branch predictor doesn't directly improve performance on its own. Instead, it gives the system a way to handle non-ideal workloads by allowing the pipeline to continue after encountering a branch, which would otherwise stall the pipeline.
@ckaffine. Very elegant answer.
My two cents:
Like superscalar, pipelining also requires additional hardware resource on the chip. There needs to be additional circuits to support fetching, decoding, executing, MEM, WB multiple instructions simultaneously. Looking at this level, the compute capability of the core is increased.
Pipelining is important to make superscalar execution effective. In order to run superscalar execution, you must have two instructions (preferably more) that are decoded and waiting to be executed. Two simultaneously decoded, independent instructions may still not be available for simultaneous execution, because of memory stall or they contending for the same execution resource. Having pipelining can help fill the instruction pool quickly with sufficient instruction candidates.
@zf I don't understand why pipelining is necessary for superscalar execution. If you have two fetch/decode units, won't the two instructions be ready for superscalar execution without any pipelining?
@jellybean I think the picture in this slide is just showing two fetch/decode units for superscalar execution. So yes superscalar execution can work without pipelining.
@jellybean You are right. I was thinking about what people call "parallel pipeline", but which is essentially a form of superscalar. I think pipelining is still important to make superscalar execution be effective. I modified my previous response accordingly.