Inside Cache

By Ashton Mills on Jul 31, 2008 8:16AM

In the spotlight

Fast50 2025 entries open: New revenue categories & profitability award

Announcing Pipeline 2025 tickets, theme & initial speakers

Australian MSP Index launched

End of Support for Windows 10: A guide for Channel Partners

Page 2 of 4 | Single page

L1
The Level 1 cache, or L1, is built into your CPU and so runs at the same speed as the CPU. This is an expensive prospect to do, and as a result L1 caches aren’t very large. However they are the fastest cache in your system, and your CPU loves them. More than Vegemite loves toast, even.

In modern CPUs the L1 is frequently broken down into two specific dedicated caches: the instruction cache and the data cache.

The instruction cache, as the name suggests, caches instructions before they enter the fetch/decode/execution cycle. The data cache stores the results of calculations before being written back to the L2 cache and onto main memory. Together they can reduce latency for calculations that require instructions already in the instruction cache, or data already in the data cache, saving the need to source these from the L2 cache. Note: that is a simplistic view of cache operation – branching prediction allows the instruction prefetcher to load instructions into the L1 instruction cache in advance, while the L1 data cache can also be pre-emptively filled from the L2 cache in advance for data that is expected to be needed. It’s all part of the magic, and complexity, of modern CPU design.

Finally, the L1 cache is always dedicated solely to a particular processor, or to a particular core on a multi-core processor. It’s also tailored to the architecture, and size isn’t necessarily an indicator of performance. Commonly, modern CPUs like Intel’s Core2 use 32k for the L1 instruction and L1 data caches, for a total of 64k L1 cache. AMD’s Athlon64 and Phenom lines uses 64k for the L1 instruction and L1 data respectively, for a total of 128K.

L2 and L3
If instructions and data can’t be found in the L1 cache, the next step is the Level 2 cache. A loose definition of caches in general is that the larger they are, the greater the hit rate they provide at the cost of latency. This is a description apt for both L2 and L3 caches. Both are substantially larger than the L1 cache, but incur a higher latency.

The L2 is where competing processors from Intel and AMD start to differ more as well. By way of example, processors like Intel’s Core2 come with whopping great big L2 caches (4MB for the 65nm brethren, 6MB for the 45nm) per dual-core. The cache is also shared between the cores. So on a 45nm quad-core Core2, although the beast comes with 12MB of L2 cache, you have to remember a Core2 quad is literally two dual-cores slapped together, and the 12MB L2 cache actually comprises two 6MB L2 caches that can’t be shared between the two dual-cores. By comparison AMD’s current Athlon64 and Phenom lines use dedicated L2 caches specific to each core, with 1MB cache per core on the FX-62 and 512KB per core on the Phenom.

Additionally the Phenom incorporates a 2MB shared Level 3 cache, which is really just a larger shared L2, but does allow the Phenoms to benefit from super-fast L1 and L2 caches specific to the processing tasks of each core while also gaining from a fast shared cache in the L3 between cores. It is, ultimately, a good design – and it’s not surprising that Intel would appear to be learning from AMD here, as Nehalem looks to be providing smaller dedicated L2 caches per core and adding a large shared L3 cache instead when it arrives.

Shared caches have a number of benefits – they allow one core to use more than its share of cache if the other cores are using less, and they facilitate sharing of data between cores, negating the need and penalty of going to main memory.

Previous Page Next Page

1 2 3 4 Single page

Got a news tip for our journalists? Share it with us anonymously here.