2007-01-20

Adaptive Replacement Cache in ZFS

Last week, I could not reach the OpenSolaris source browser. I was looking for an explanation of what is called the 'ARC', or Adaptive Replacement Cache, in ZFS.

In contrast to the venerable UFS and NFS, ZFS does not use the normal solaris VM subsystem for its page cache. Instead, pages are mapped into the kernel address space, and managed by the ARC.

Looking through the zfs-discuss archives, I did not find any explanation of the ARC, except for references to the solaris Architecture Council, which is useful enough in itself, but does not deal specifically with paging algoritms...

Googling around, I finally found some useful references: Roch Bourbonnais explains the acronym, and refer to the IBM Almaden research lab, where the Adaptive Replacement Cache algorithm was developed.

In the original IBM version, it uses a cache directory twice as large as needed for the cache size. The extra space is used to keep track of recently evicted entries, so we know if a cache miss actually refers to
a recently used page or not.

After I created the wiki entry I came up with this visualisation of the cache directory:

. . . [1 hit,evicted <-[1 hit, in cache <-|-> 2 hits, in cache]-> 2 hits, evicted] . . .

and the following for a modification in Solaris ZFS, which knows in advance that it
should not throw out certain pages:

. . . [1 hit,evicted <-[1 hit, in cache <-|non-evictable|-> 2 hits, in cache]-> 2 hits, evicted] . . .

The inner brackets represent the actual cache, while the outer brackets show the virtual directory, referring to evicted entries. The total size for the cache is of course fixed, but it moves freely between the outer brackets. In addition, the divider in the middle can also move around, favouring recent or frequent hits.

Because the cache is mapped into kernel memory, this puts quite some stress on 32bit (x86) systems, as the 4GB address space on that architecture is shared by kernel and user space. Space used by the cache limits the size of user processes. Don't run your DBMS on one of these.

Links:
Wikipedia: Adaptive_Replacement_Cache

2 comments:

Henk Langeveld said...

The source browser is back, arc.c can be found at http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c

Roch said...

Thanks for this.

Readers, do note that "the cache limits the size of user processes " applies to 32-bit kernels only.