Vector Optimization – Multiple Lanes

A particular instruction is carried out by an execution pipeline. For example an ADD instruction will be carried out by an ADD function unit pipeline.

If only one such execution pipeline is available, it operates on the elements of a vector register sequentially. This is shown in the figure on the left. It has a single ADD pipeline and can complete one addition per cycle.

If however, multiple pipelines are available, each pipeline can consume an element of the vector register. This is shown in the figure on the right. It has four ADD pipelines can can complete four additions per cycle.

The elements within the vector register are interleaved across the four pipelines. In the above figure, elements 0, 4 and 8 end up in C[0]. Also notice that, a function unit operates on the same Nth element of multiple vector registers (A[4] and B[4]).

The set of elements that move through the pipelines together is termed an element group.

Adding multiple lanes requires little increase in control complexity and does not require changes to existing machine code. If the clock rate of a vector processor is halved, doubling the number of lanes will retain the same potential performance.

Leave a Reply

Your email address will not be published. Required fields are marked *