Multithreading - The Beard Sage

Multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion. Multithreading, however, does not duplicate the entire processor as a multiprocessor does. Instead, multithreading shares most of the processor core among a set of threads, duplicating only private state, such as the registers and program counter.

Duplicating the per-thread state of a processor core means creating a separate register file, a separate PC, and a separate page table for each thread. A thread switch should be much more efficient than a process switch. Multiple threads in a program are identified either by a compiler or by the programmer.

Benefits

To tolerate latency memory operations, dependent instructions, branch resolution. When one thread encounters a long-latency operation, the processor can execute a useful operation from another thread.

To utilize processing resources more efficiently and improve system throughput.

Coarse-grained Multithreading

Coarse-grained multithreading switches threads only on costly stalls, such as level two or three cache misses. This change relieves the need to have thread-switching be essentially free and is much less likely to slow down the execution of any one thread, since instructions from other threads will only be issued when a thread encounters a costly stall.

It is limited in its ability to overcome throughput losses, especially from shorter stalls. When a stall occurs the pipeline will see a bubble before the new thread begins executing – useful only for reducing the penalty of very high-cost stalls.

The long stalls are partially hidden by switching to another thread that uses the resources of the processor. This switching reduces the number of completely idle clock cycles. However, thread switching only occurs when there is a stall. Because the new thread has a start-up period, there are likely to be some fully idle cycles remaining.

Fine-grained Multithreading

Fine-grained multithreading switches between threads on each clock, causing the execution of instructions from multiple threads to be interleaved. This interleaving is often done in a round-robin fashion, skipping any threads that are stalled at that time.

One key advantage of fine-grained multithreading is that it can hide the through- put losses that arise from both short and long stalls, since instructions from other threads can be executed when one thread stalls, even if the stall is only for a few cycles

The primary disadvantage of fine-grained multithreading is that it slows down the execution of an individual thread, since a thread that is ready to execute without stalls will be delayed by instructions from other threads.

Simultaneous Multithreading

A variation on fine-grained multithreading that arises naturally when fine-grained multithreading is implemented on top of a multiple-issue, dynamically scheduled processor.

All issues come from one thread, although instructions from different threads can initiate execution in the same cycle, using the dynamic scheduling hardware to determine what instructions are ready.

Register renaming and dynamic scheduling allow multiple instructions from independent threads to be executed without regard to the dependences among them; the resolution of the dependences can be handled by the dynamic scheduling capability.

Benefits

Coarse-grained Multithreading

Fine-grained Multithreading

Simultaneous Multithreading

Leave a Reply Cancel reply