An advantage of vector architectures is that compilers can tell programmers at compile time whether a section of code will vectorize or not, often giving hints as to why it did not vectorize the code.
Performance can thus be improved by revising code or by giving hints to the compiler when it’s OK to assume independence between operations (scatter-gather). Both compiler and programmer give hints to each other on how to improve performance.
The main factor that affects the success with which a program run in vector mode is the structure of the program. Do the loops have true data dependencies? Can they be restructured? Which algorithm is chosen? How is it coded?