A particular instruction is carried out by an execution pipeline. For example an ADD
instruction will be carried out by an ADD
function unit pipeline.
If only one such execution pipeline is available, it operates on the elements of a vector register sequentially. This is shown in the figure on the left. It has a single ADD
pipeline and can complete one addition per cycle.
If however, multiple pipelines are available, each pipeline can consume an element of the vector register. This is shown in the figure on the right. It has four ADD
pipelines can can complete four additions per cycle.
The elements within the vector register are interleaved across the four pipelines. In the above figure, elements , and end up in C[0]
. Also notice that, a function unit operates on the same th element of multiple vector registers (A[4]
and B[4]
).
The set of elements that move through the pipelines together is termed an element group.
Adding multiple lanes requires little increase in control complexity and does not require changes to existing machine code. If the clock rate of a vector processor is halved, doubling the number of lanes will retain the same potential performance.