OpenCL – Architecture and Program

A look into the architecture of OpenCL and how an OpenCL program interacts with the CPU and GPU

Continue reading →

OpenCL – Platform and Execution Model

Visualizing the OpenCL platform as a hierarchy of execution units and the work associated with them.

Continue reading →

A look into how CUDA streaming multiprocessor schedules and executes the instructions from the hardware side.

Continue reading →

Visualizing CUDA memory hierarchy in terms of access, scope, lifetime and speed.

Continue reading →

Visualizing the different dimensions of mapping threads, blocks and grids.

Continue reading →

CUDA threads, blocks and grids as a hierarchy of computation groups, how they are invoked and how they synchronize with each other.

Continue reading →

A short introduction about the CUDA programming model and data flow.

Continue reading →

A comparison of SIMD multimedia extensions with vector architecture.

Continue reading →

Compiler can guide the programmer to improve code to make them vectorizable.

Continue reading →

Sparse matrices and non-strided memory locations can be accessed by vectors using gather scatter operation.

Continue reading →