Virtual Memory - Translation-Lookaside Buffer (TLB)

Since page tables are stored in main memory, every memory access by a program can take at least twice as long: one memory access to obtain the physical address and a second access to get the data. To avoid this, we can exploit both spatial and temporal locality by creating a special cache for storing recently used translations.

A Translation-Lookaside Buffer (TLB) is a cache that keeps track of recently used address mappings to try to avoid an access to the page table. Each tag entry in the TLB holds a portion of the virtual page number, and each data entry of the TLB holds a physical page number.

The TLB acts as a cache of the page table for the entries that map to physical pages only. The TLB contains a subset of the virtual-to-physical page mappings that are in the page table. The TLB mappings are shown in blue. Because the TLB is a cache, it must have a tag field. If there is no matching entry in the TLB for a page, the page table must be examined. The page table either supplies a physical page number for the page (which can then be used to build a TLB entry) or indicates that the page resides on disk, in which case a page fault occurs.

Each TLB entry keeps track of 3 bits

Valid The entry in the TLB or page table is legitimate.
Dirty The page has been written and is inconsistent with disk. Will need to be written back upon replacement.
Reference A bit indicating the entry has been recently used. Periodically, all reference bits are cleared.

On every reference, we look up the virtual page number in the TLB. If we get a hit, the physical page number is used to form the address, and the corresponding reference bit is turned on. If the processor is performing a write, the dirty bit is also turned on.

If a miss in the TLB occurs, we must determine whether it is a page fault or merely a TLB miss. If the page exists in memory, then the TLB miss indicates only that the translation is missing. In such cases, the processor can handle the TLB miss by loading the translation from the page table into the TLB and then trying the reference again. If the page is not present in memory, then the TLB miss indicates a true page fault. In this case, the processor invokes the operating system using an exception.

Because the TLB has many fewer entries than the number of pages in main memory, TLB misses will be much more frequent than true page faults.

Because the reference and dirty bits are contained in the TLB entry, we need to copy these bits back to the page table entry when we replace an entry. These bits are the only portion of the TLB entry that can be changed. Using write-back—that is, copying these entries back at miss time rather than when they are written—is very efficient, since we expect the TLB miss rate to be small.

Any bits that determine the access rights for a page must be included in both the page table and the TLB, because the page table is accessed only on a TLB miss.

Address Translation

If the TLB has $T = 2^t$ sets, then the TLB index (TLBI) consists of the $t$ least significant bits of the Virtual Page Number (VPN), and the TLB tag (TLBT) consists of the remaining bits in the Virtual Page Number (VPN).

Typical Values for a TLB

TLB size 16–512 entries
Block size 1–2 page table entries (typically 4–8 bytes each)
Hit time 0.5–1 clock cycle
Miss penalty 10–100 clock cycles
Miss rate 0.01%–1%

TLBs are fully associative because a fully associative mapping has a lower miss rate; furthermore, since the TLB is small, the cost of a fully associative mapping is not too high.

With a fully associative mapping, choosing the entry to replace becomes tricky since implementing a hardware LRU scheme is too expensive. Furthermore, since TLB misses are much more frequent than page faults and thus must be handled more cheaply, we cannot afford an expensive software algorithm, as we can for page faults.

There is an extra complication for write requests: namely, the write access bit in the TLB must be checked. This bit prevents the program from writing into pages for which it has only read access.

Under the best of circumstances, a virtual address is translated by the TLB and sent to the cache where the appropriate data is found, retrieved, and sent back to the processor. In the worst case, a reference can miss in all three components of the memory hierarchy: the TLB, the page table, and the cache.

TLB Hit

Processor sends Virtual Address (VA)
Extract Virtual Page Number(VPN) from VA. Query TLB using VPN.
TLB returns Page Table Entry (PTE).
Combine PTE with Page Offset to get Physical Address (PA).
Query Cache/Memory using PA.
Send data to processor.

TLB Miss

Processor sends Virtual Address (VA)
Extract Virtual Page Number (VPN) from VA. Query TLB using VPN.
TLB miss. Query Memory using Page Table Entry Address (PTEA) to get Page Table Entry (PTE)
Save VPN to PTE mapping in TLB. Combine PTE with Page Offset to get Physical Address (PA).
Query Cache/Memory using PA.
Send data to processor.

TLB Misses and Page Faults

A TLB miss can indicate

The page is present in memory, and we need only create the missing TLB entry.
The page is not present in memory, and we need to transfer control to the operating system to deal with a page fault.

MIPS traditionally handles a TLB miss in software. It brings in the page table entry from memory and then re-executes the instruction that caused the TLB miss. Upon re-executing, it will get a TLB hit. If the page table entry indicates the page is not in memory, this time it will get a page fault exception.

Handling a TLB miss or a page fault requires using the exception mechanism to interrupt the active process, transferring control to the operating system, and later resuming execution of the interrupted process. To restart the instruction after the page fault is handled, the program counter of the instruction that caused the page fault must be saved in the exception program counter (EPC).

A TLB miss or page fault exception must be asserted by the end of the same clock cycle that the memory access occurs, so that the next clock cycle will begin exception processing rather than continue normal instruction execution.

Once the operating system knows the virtual address that caused the page fault, it must complete three steps

Look up the page table entry using the virtual address and find the location of the referenced page on disk.
Choose a physical page to replace; if the chosen page is dirty, it must be written out to disk before we can bring a new virtual page into this physical page.
Start a read to bring the referenced page from disk into the chosen physical page. The operating system will usually select another process to execute in the processor until the disk access completes.

When the read of the page from disk is complete, the operating system can restore the state of the process that originally caused the page fault and execute the instruction that returns from the exception. This instruction will reset the processor from kernel to user mode, as well as restore the program counter. The user process then re-executes the instruction that faulted, accesses the requested page successfully, and continues execution.

Page fault exceptions for data accesses are difficult to implement properly in a processor because

They occur in the middle of instructions, unlike instruction page faults.
The instruction cannot be completed before handling the exception.
After handling the exception, the instruction must be restarted as if nothing had occurred.

The TLB miss handler does not check to see if the page table entry is valid. Because the exception for TLB entry missing is much more frequent than a page fault, the operating system loads the TLB from the page table without examining the entry and restarts the instruction. If the entry is invalid, another and different exception occurs, and the operating system recognizes the page fault. This method makes the frequent case of a TLB miss fast, at a slight performance penalty for the infrequent case of a page fault.

Integrating Virtual Memory, TLBs, and Caches

If the TLB generates a hit, the cache can be accessed with the resulting physical address. For a read, the cache generates a hit or miss and supplies the data or causes a stall while the data is brought from memory. If the operation is a write, a portion of the cache entry is overwritten for a hit and the data is sent to the write buffer if we assume write-through. A write miss is just like a read miss except that the block is modified after it is read from memory. Write-back requires writes to set a dirty bit for the cache block, and a write buffer is loaded with the whole block only on a read miss or write miss if the block to be replaced is dirty. Notice that a TLB hit and a cache hit are independent events, but a cache hit can only occur after a TLB hit occurs, which means that the data must be present in memory.

TLB	Page Table	Cache	Possible?	Notes
Hit	Hit	Miss	Yes	Page Table is never checked if TLB hits
Miss	Hit	Hit	Yes	TLB misses, entry found in Page Table. After retry data is found in Cache
Miss	Hit	Miss	Yes	TLB misses, entry found in Page Table. After retry data misses in Cache
Miss	Miss	Miss	Yes	TLB misses, followed by a Page Fault. After retry data misses in Cache
Hit	Miss	Miss	No	Cannot have translation in TLB if Page Table not in memory
Hit	Miss	Hit	No	Cannot have translation in TLB if Page Table not in memory
Miss	Miss	Hit	No	Data cannot be allowed in Cache if Page Table is not memory

The Beard Sage

Virtual Memory – Translation-Lookaside Buffer (TLB)