Tomasulo’s Algorithm – Example

Consider the above architecture of a machine that supports speculative execution, dynamic scheduling and register renaming. Assume in-order instruction issue, out-of-order execution, and in-order commit.

Issue

Only 1 instruction is issued per cycle.

Execute

Instruction can be executed if it has all operands read. If there is a RAW hazard, the value of operands is ready after the value is written in ROB from CDB.

Common Data Bus

CDB writes back to ROB needs 1 cycle.

Re-Order Buffer

Assume that the Reorder Buffer size is infinite. This will eliminate any stalls due to the ROB being full. The value of operands is ready after the value is written in ROB from CDB.

Commit

At most one instruction can be committed each cycle.

Cache

Assume no cache miss. Only load and store instruction will access the data cache. Cache is fully pipelined and has no structure hazard.

Load Store

Store instruction uses CDB to write the effective address to ROB for subsequent memory write.

Branch

Branch instructions use CDB to write “Taken” or “NotTaken” to ROB. Assume perfect branch predictor and no miss in branch target buffer (BTB).

Function Units

There are 4 function units.

Integer ALU

Integer ALU operations need 1 cycle in EX stage. All integer and branch operations are done in integer ALU. Integer ALU has 2 Reservation Stations. The execution of conditional branch instructions is resolved in one cycle in integer ALU.

FP Adder

FP addition needs 4 cycles in EX stage. FP Adder has 3 Reservation Stations.

FP Multiplier

FP multiplication needs 8 cycles in EX stage. FP Multplier has 2 Reservation Stations.

Load Store Unit

Loads and stores needs 1 cycle in EX stage (for calculating effective address), and 2 cycles for cache access. The machine has 1 load and store unit (LS). Calculating effective address does not use the integer ALU, instead, it uses a designated adder in the load and store unit. Load Store Unit has a buffer of size 5.

Instruction Sequence

Consider the following sequence of instructions. Think of each Cycle as a period of time and not a point in time.

L.D F0, 0(R1)

  • Since this is the first instruction, this would not have any data dependence.
  • This is a load instruction and thus would require the Load/Store function unit.
  • The instruction is issued during Cycle 1.
  • The effective address M[0 + R1] calculation requires one cycle in EX stage and two cycles for Cache Access
  • In the next cycle the CDB writes F0 to the ROB.
  • In the next cycle F0 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456

ADD.D F6, F0, F4

  • There is a RAW dependence on F0
  • This is an add instruction and thus would require the Adder function unit.
  • The instruction is issued during Cycle 2.
  • F4 is available. However, the Reservation Station has to wait for the value of F0 from ROB.
  • F0 is available during Cycle 6 and execution starts
  • Addition takes 4 cycles.
  • In the next cycle the CDB writes F6 to the ROB.
  • In the next cycle F6 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011

L.D F2, 100(R1)

  • There is no data dependence.
  • This is a load instruction and thus would require the Load/Store function unit.
  • The instruction is issued during Cycle 3.
  • The effective address M[100 + R1] calculation requires one cycle in EX stage and two cycles for Cache Access
  • In the next cycle the CDB writes F2 to the ROB.
  • In the next cycle F2 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712

MUL.D F0, F2, F6

  • There is a RAW dependence on F2 and F6.
  • This is a multiply instruction and thus would require the Multiplier function unit.
  • The instruction is issued during Cycle 4.
  • Both F2 and F6 are unavailable. The Reservation Stations have to wait for the values of F2 and F6 from ROB.
  • F2 is available during Cycle 6 and F6 is available during Cycle 11. Thus, execution starts during Cycle 11.
  • Multiplication takes 8 cycles.
  • In the next cycle the CDB writes F0 to the ROB.
  • In the next cycle F0 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920

S.D F0, 0(R1)

  • There is a RAW dependence on F0. The value at F0 needs to be stored in memory.
  • This is a store instruction and thus would require the Load/Store function unit.
  • The instruction is issued during Cycle 5.
  • The effective address M[0 + R1] calculation requires one cycle in EX stage.
  • In the next cycle (cycle 7) the CDB writes M[0 + R1] to the ROB. However, there is a contention with an earlier CDB write to ROB. Thus there is a delay of one cycle and the CDB writes M[0 + R1] to the ROB during cycle 8.
  • As mentioned above, the Store instruction uses CDB to write the effective address to ROB for subsequent memory write. Because instructions are committed in order, it is guaranteed that F0 will be available when this instruction is committed.
  • The instruction commits in-order.
  • It takes another 2 cycles to update the cache. Note that the cache is only updated after the instruction is committed.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821

ADDI R1, R1, #8

  • There is a WAR dependence on R1.
  • This is an add immediate instruction and thus would require the Integer ALU function unit.
  • The instruction is issued during Cycle 6.
  • The effective address M[0 + R1] calculation in the previous step finishes execution after cycle 6. The dependence on R1 no longer exists. Thus, this instruction can proceed during the next cycle
  • In the next cycle (cycle 7) the integer operation is executed which takes one cycle.
  • In the next cycle (cycle 8) the CDB writes R1 to the ROB. However, there is a contention with an earlier CDB write to ROB. Thus there is a delay of one cycle and the CDB writes R1 to the ROB during cycle 9.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821
6ADDI R1, R1, #867ALU922

BNE R1, R3, loop

  • There is a RAW dependence on R1.
  • This is a branch instruction and thus would require the Integer ALU function unit.
  • The instruction is issued during Cycle 7.
  • R1 is unavailable. The Reservation Stations have to wait for the value of R1 from ROB.
  • R1 is available during Cycle 10 and execution starts.
  • The integer operation takes one cycle.
  • In the next cycle the CDB writes “Taken”/”Not Taken” to the ROB.
  • In the next cycle “Taken”/”Not Taken” is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821
6ADDI R1, R1, #867ALU922
7BNE R1, R3, loop710ALU1123

L.D F0, 0(R1)

  • There is a RAW dependence on R1.
  • This is a load instruction and thus would require the Load/Store function unit.
  • The instruction is issued during Cycle 8.
  • R1 is unavailable. The Reservation Stations have to wait for the value of R1 from ROB.
  • R1 is available during Cycle 10.
  • The effective address M[0 + R1] calculation requires one cycle in EX stage and two cycles for Cache Access
  • In the next cycle the CDB writes F0 to the ROB.
  • In the next cycle F0 is available.
  • The instruction commits in-order.
  • Note that there is no dependency between this F0 and the F0 output by the multiplication instruction as they are part of two different loop iterations. The Re-Order buffer implicitly handles Register Renaming.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821
6ADDI R1, R1, #867ALU922
7BNE R1, R3, loop710ALU1123
8L.D F0, 0(R1)810LS11-121324

ADD.D F6, F0, F4

  • There is a RAW dependence on F0.
  • This is an add instruction and thus would require the Adder function unit.
  • The instruction is issued during Cycle 9.
  • F4 is available. However the Reservation Station has to wait for the value of F0 from ROB.
  • F0 is available during Cycle 14 and execution starts
  • Addition takes 4 cycles.
  • In the next cycle the CDB writes F6 to the ROB.
  • In the next cycle F6 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821
6ADDI R1, R1, #867ALU922
7BNE R1, R3, loop710ALU1123
8L.D F0, 0(R1)810LS11-121324
9ADD.D F6, F0, F4914-17ADD1825

L.D F2, 100(R1)

  • There is no data dependence.
  • This is a load instruction and thus would require the Load/Store function unit.
  • The instruction is issued during Cycle 10.
  • The effective address M[100 + R1] calculation requires one cycle in EX stage and two cycles for Cache Access
  • In the next cycle the CDB writes F2 to the ROB.
  • In the next cycle F2 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821
6ADDI R1, R1, #867ALU922
7BNE R1, R3, loop710ALU1123
8L.D F0, 0(R1)810LS11-121324
9ADD.D F6, F0, F4914-17ADD1825
10L.D F2, 100(R1)1011LS12-131426

MUL.D F0, F2, F6

  • There is a RAW dependence on F2 and F6.
  • This is a multiply instruction and thus would require the Multiplier function unit.
  • The instruction is issued during Cycle 11.
  • Both F2 and F6 are unavailable. The Reservation Stations have to wait for the values of F2 and F6 from ROB.
  • F2 is available during Cycle 15 and F6 is available during Cycle 19. Thus, execution starts during Cycle 19.
  • Multiplication takes 8 cycles.
  • In the next cycle the CDB writes F0 to the ROB.
  • In the next cycle F0 is available.
  • The instruction commits in-order.
#InstructionIssueExecuteFunction UnitCache AccessCDB Write to ROBCommit
1L.D F0, 0(R1)12LS3-456
2ADD.D F6, F0, F426-9ADD1011
3L.D F2, 100(R1)34LS5-6712
4MUL.D F0, F2, F6411-18MULT1920
5S.D F0, 0(R1)56LS22-23821
6ADDI R1, R1, #867ALU922
7BNE R1, R3, loop710ALU1123
8L.D F0, 0(R1)810LS11-121324
9ADD.D F6, F0, F4914-17ADD1825
10L.D F2, 100(R1)1011LS12-131426
11MUL.D F0, F2, F61119-26MULT2728

Caveats

In the above steps there were enough Reservation Stations to buffer and execute each instruction. If however, the Reservation Stations for a Function Unit fill up, an instruction of that type will be stalled until a station is freed up.

Leave a Reply

Your email address will not be published. Required fields are marked *