A look into the architecture of OpenCL and how an OpenCL program interacts with the CPU and GPU
Continue readingOpenCL – Platform and Execution Model
Visualizing the OpenCL platform as a hierarchy of execution units and the work associated with them.
Continue readingCUDA – Streaming Multiprocessors
A look into how CUDA streaming multiprocessor schedules and executes the instructions from the hardware side.
Continue readingCUDA – Memory Hierarchy
Visualizing CUDA memory hierarchy in terms of access, scope, lifetime and speed.
Continue readingCUDA – Dimensions, Mapping and Indexing
Visualizing the different dimensions of mapping threads, blocks and grids.
Continue readingCUDA – Threads, Blocks, Grids and Synchronization
CUDA threads, blocks and grids as a hierarchy of computation groups, how they are invoked and how they synchronize with each other.
Continue readingCUDA – Programming Model
A short introduction about the CUDA programming model and data flow.
Continue readingSIMD Multimedia Extension
A comparison of SIMD multimedia extensions with vector architecture.
Continue readingVector Optimization – Programming
Compiler can guide the programmer to improve code to make them vectorizable.
Continue readingVector Optimization – Gather Scatter
Sparse matrices and non-strided memory locations can be accessed by vectors using gather scatter operation.
Continue reading