bits, bytes, packets, scripts...: solaris

Showing posts with label solaris. Show all posts

2008-09-21

The trouble with /var in unix systems

Mike Gerdts suggests using /var/share for sharing data between boot environments in his March 2008 post Future of OpenSolaris Boot Environment management.

The trouble with unix /var is that it is a grabbag, in that it is used both for storing system data (i.e., identity), as for storing application data.

/var/ is intended for variable data that should persist through a reboot: short lived temp files, spool directories for transient files (printing,mail), and whatever data applications might wish to store (/var/opt).

However, as said above, it also contains data about a systems 'identity', specific settings like cronjobs, printer settings, network configuration, definitions of services that altogether make up what the box 'is'. This also includes the list of installed software in /var/sadm. At some point these all moved into /var from /usr or /etc in ancient unix past.

Root was supposed to be small then, and did not contain a lot that could change, limiting the risk of a long fsck if there were a lot of modified files in the root fs. With just enough stuff to allow the system to boot, if you lost any other filesystem, if you had at least a running system you could attempt to repair or restore whatever was broken.

Mike's suggestion is very sound, as it does what is required. System identity is kept in the root fs (/), while application data, whether it's persistent or transient, goes into /var/share, keeping /var as part of the root fs, even though the 'share' name may be a bit of an unlucky choice - it's used in /usr/share and other places for architecture-neutral data that can be shared with other systems (nfs clients).

2007-01-28

A difference in mindset

Some of my favourite literature are unix man pages. When I got started on unix, the system (a 3B2 w/ SVR2.0 IIRC) we used at the vakgroep AIV only had printed documentation. But it had all of its man pages in neat little red binders. The format fit exactly the way I absorb information, a quick overview, followed by a linear list of features.

Commands or options listed on the left hand side, with their explanation on the right. As a quick reader, I've never had any trouble distilling the information given in a man page. If I don't understand something, I just keep on going until I either find an explanation or get lost entirely. No problem. A second reading will often put things in a different perspective. And a third or fourth careful reading may sometimes clear up some assumptions or delusions as well.

I read the man page for sh(1), ksh(1) and ksh93(1) at least once a year. And I learn from it. I know most of the features supported by the original Bourne shell, plus the differences with the non-existing POSIX shell, Sun's ksh and AT&T's ksh93. I must admit I'm not confident with most bashisms, though.

The funny thing is that, until I encountered the dummy guides, I had never considered that some people do not like to read through a full description of all the features of a piece of equipment or a piece of software. It came as quite a revelation to me that most people, in fact, do not want to understand things, they just want them to work.

So the type of documentation a geek like me prefers is more like:

option x will do ...
option y does ...

etc., while a non-technical person will prefer text like:

To perform action X, select/press/dial ...
If you want to ..., do ...

The difference is like the P/J difference in the Meyers-Briggs typology. An old style hacker/nerd prefers to be given a set of options to explore (P), where another would prefer a more goal-oriented(J) approach.

I'm not saying one approach is better than another. I'm too much of an observer for that. I like to watch and see how people differ, than take a position on one side or another.

Let us be different.

2007-01-20

Adaptive Replacement Cache in ZFS

Last week, I could not reach the OpenSolaris source browser. I was looking for an explanation of what is called the 'ARC', or Adaptive Replacement Cache, in ZFS.

In contrast to the venerable UFS and NFS, ZFS does not use the normal solaris VM subsystem for its page cache. Instead, pages are mapped into the kernel address space, and managed by the ARC.

Looking through the zfs-discuss archives, I did not find any explanation of the ARC, except for references to the solaris Architecture Council, which is useful enough in itself, but does not deal specifically with paging algoritms...

Googling around, I finally found some useful references: Roch Bourbonnais explains the acronym, and refer to the IBM Almaden research lab, where the Adaptive Replacement Cache algorithm was developed.

In the original IBM version, it uses a cache directory twice as large as needed for the cache size. The extra space is used to keep track of recently evicted entries, so we know if a cache miss actually refers to
a recently used page or not.

After I created the wiki entry I came up with this visualisation of the cache directory:

. . . [1 hit,evicted <-[1 hit, in cache <-|-> 2 hits, in cache]-> 2 hits, evicted] . . .

and the following for a modification in Solaris ZFS, which knows in advance that it
should not throw out certain pages:

. . . [1 hit,evicted <-[1 hit, in cache <-|non-evictable|-> 2 hits, in cache]-> 2 hits, evicted] . . .

The inner brackets represent the actual cache, while the outer brackets show the virtual directory, referring to evicted entries. The total size for the cache is of course fixed, but it moves freely between the outer brackets. In addition, the divider in the middle can also move around, favouring recent or frequent hits.

Because the cache is mapped into kernel memory, this puts quite some stress on 32bit (x86) systems, as the 4GB address space on that architecture is shared by kernel and user space. Space used by the cache limits the size of user processes. Don't run your DBMS on one of these.

Links:
Wikipedia: Adaptive_Replacement_Cache

bits, bytes, packets, scripts...