What happens when all ALUs only follow 1 path of the branch? Is the other (untaken) branch skipped?
If so, do we know which path to take because we stalled to compute the branch condition or because we employed branch prediction?
@thomask. A quality implementation would then avoid executing the untaken path, and performance would always be at peak SIMD utilization.
In that case, a good implementation would block until the condition is computed?
In simple block-structured code, here's an implementation that completely skips the if or else blocks if no iterations of the loop take that path.
vector_mask = compute_predicate()
run 'if block' using vector_mask as a write mask
else if (any_ones(invert(vector_mask)))
run 'else block' using invert(vector_mask) as a write mask