SIMD extensions are extra instructions (MMX, SSE and AVX) that were added to the x86 architecture to support vector like operations. Like vector instructions, a SIMD instruction specifies the same operation on vectors of data. Unlike vector instructions, SIMD instructions tend to specify fewer operands and hence use much smaller register files.
SIMD extension vs Vector architecture
Multimedia SIMD extensions fix the number of data operands in the opcode. Vector architectures have a vector length register that specifies the number of operands for the current operation. These variable-length vector registers easily accommodate programs that naturally have shorter vectors than the maximum size the architecture supports. Moreover, vector architectures have an implicit maximum vector length} in the architecture, which combined with the vector length register avoids the use of many opcodes.
Multimedia SIMD does not offer the more sophisticated addressing modes of vector architectures, namely strided accesses and gather-scatter accesses. These features increase the number of programs that a vector compiler can successfully vectorize.
Multimedia SIMD usually does not offer the mask registers to support conditional execution of elements as in vector architectures.
These omissions make it harder for the compiler to generate SIMD code and increase the difficulty of programming in SIMD assembly language.
Why Multimedia SIMD extensions so popular?
- Cost little to add to the standard arithmetic unit and they were easy to implement.
- Require little extra state compared to vector architectures, which is always a concern for context switch times.
- A lot of memory bandwidth is needed to support a vector architecture, which many computers don’t have.
- SIMD does not have to deal with problems in virtual memory when a single instruction that can generate multiple memory accesses can get a page fault in the middle of the vector. SIMD extensions use separate data transfers per SIMD group of operands that are aligned in memory, and so they cannot cross page boundaries.
- Fixed length vectors of SIMD makes it easy to introduce instructions that can help with new media standards, such as instructions that perform permutations or instructions that consume either fewer or more operands than vectors can produce.
- There was concern about how well vector architectures can work with caches.
More recent vector architectures have addressed all of these problems, but the legacy of past flaws shaped the skeptical attitude toward vectors among architects.
Programming Multimedia SIMD Architectures
Given the ad hoc nature of the SIMD multimedia extensions, the easiest way to use these instructions has been through libraries or by writing in assembly language.
By borrowing techniques from vectorizing compilers, compilers are starting to produce SIMD instructions automatically.
However, programmers must be sure to align all the data in memory to the width of the SIMD unit on which the code is run to prevent the compiler from generating scalar instructions for otherwise vectorizable code.