Caches

dL1,iL1(data,instruction) cache of 32KB each($ lscpu), supports write-back ($ lshw)

λ getconf -a | grep CACHE
 
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC                8
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_SIZE                 32768
LEVEL1_DCACHE_ASSOC                8
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_SIZE                  262144
LEVEL2_CACHE_ASSOC                 4
LEVEL2_CACHE_LINESIZE              64
LEVEL3_CACHE_SIZE                  6291456
LEVEL3_CACHE_ASSOC                 12
LEVEL3_CACHE_LINESIZE              64

Basics

caches don’t operate on arbitrary-sized variables, only 64-byte cache lines.

Types of `miss`

Cold (compulsory) miss: at the start when cache is empty.
Capacity miss: when working set is larger than the cache.
Conflict miss: this is common in direct mapping, happens less in e-way associatve caches. Block i at level k+1 must be placed in block (i mod 4) at level k

Organization

(S,E,B)
S = 2^s sets
E = 2^e lines per set. OR E blocks per set
- Each line consist of a block and few more things.
B = 2^b bytes per cache block. (this is what we mean when we say cache block size)
Cache blocks are interesting, the purpose of blocks is to exploit spatial locality. If you make the blocks too small, you might not fetch enough neighbor bits so that spatial locality is at use.

Designing a cache

The block size comes first
Then you determine how big you want your cache to be
Then you decide on the associativity
Once you determine the associativity, that determines the number of sets.
```
Associativity == S (no of sets, 2^s)
```

Cache placement policies

These can be categorized based on associativity. When replacing a block/line in a cache, cache replacement policies such as LRU are used., the lines have additional blocks to store timestamps etc.

Even though fully associative cache give the best miss rate, It’s not worth having this much of complex search logic in the hardware of cache to support low associativity to support full associativity.

We do see full associativity in virtual memory, because the penalty in DRAM to Disk is way more than penalty from cache to DRAM.

Fully associative cache takes a lot of power and area, and direct-mapped caches take the least.

Eg. Skylake CPU

L1 cache: 8 slots (or “ways”). “8-way set associative cache” because it has 8 ways.

L2 cache: on a Skylake is 16-way set associative

L3 cache: is 11-way set associative.

Eviction:

The set size of a cache influences how long a cache line is likely to stay in the cache

The more associativity the cache has (the more ways there are), the longer it will take for a cache line to be evicted.

i.e direct > E-way > Full (i.e direct gets evicted as soon as some access is mapped to the same set)

Direct Mapped cache (aka 1-way)

E = 1, multiple S. The cache can be framed as a (S*1) column matrix. good best-case time, but unpredictable in worst case.

Fully associative cache

E is multiple but S =1. The cache organization can be framed as (1*E) row matrix.

E-way associative cache/ Set-associative cache

When E is more than 1. A set-associative cache can be imagined as a (E*S) matrix. Eight-way set associative cache is common.

Writing

Metrics

Miss Rate (%)
Hit Time (cpu cycles); eg. 4clock cycle for L1
Miss Penalty (cpu cycles) ; Total time for a miss = hit time=+=miss penalty

Improving locality

Lots of doubts in Cache miss analysis

Spatial Locality

Rearranging loops. Basically going from right-to-left, we want the leftmost item to be changing the fastest eg. A[k][i][j], here j should be changing the most if the nested loops involved are for i,=j=and k; so rearranging nested loops in the order of k-i-j will give the best spatial locality.
It implies that the elements physically located nearby have a good chance of being located close to each other. This principle is used in a process called “prefetch” that loads file contents ahead of time

See Memory Hierarchy See Matrix

Temporal Locality

Supplementary notes on using Blocking to Increase Temporal Locality

Links/Ref

You (Probably) Shouldn’t use a Lookup Table - Speculative Branches

🐏 mogoz

Table of Contents

Caches

Basics

Types of `miss`

Organization

Designing a cache

Cache placement policies

Direct Mapped cache (aka 1-way)

Fully associative cache

E-way associative cache/ Set-associative cache

Writing

Metrics

Improving locality

Spatial Locality

Temporal Locality

Links/Ref

Graph View

Backlinks

🐏 mogoz

Table of Contents

Caches

Basics §

Types of miss §

Organization §

Designing a cache §

Cache placement policies §

Direct Mapped cache (aka 1-way) §

Fully associative cache §

E-way associative cache/ Set-associative cache §

Writing §

Metrics §

Improving locality §

Spatial Locality §

Temporal Locality §

Links/Ref §

Graph View

Backlinks

Basics

Types of `miss`

Organization

Designing a cache

Cache placement policies

Direct Mapped cache (aka 1-way)

Fully associative cache

E-way associative cache/ Set-associative cache

Writing

Metrics

Improving locality

Spatial Locality

Temporal Locality

Links/Ref