bits, bytes, packets, scripts...: 2007

2007-12-28

bash vi mode converging with ksh

One of the great features of ksh (93 and sun's) is the option to invoke an external editor on the current command line, while preserving multi-line layout.

In vi-editing mode, the ESC-'v' keystroke invokes an external editor ${VISUAL:-${EDITOR:-vi}} on the current command line. Bash does this as well, but on exit, all newlines are replaced with semicolons, while ksh preserves them.

It's one of my minor nits with bash, but for me it's important. I like my 'one-liners' readable, even if they span half a screen... This is nice to have, as many of the script that I've written were prototyped on the command line.

Shortly after my original post I read the bash(1) man page and found me the 'lithist' shell option.

It works of sorts, but it's not the behaviour I'm familiar with in ksh.

When scrolling back in history, ksh truncates long lines and lets you scroll horizontally, while bash will wrap them across multiple lines. Not bad, but counter-intuitive in vi-mode. Maybe there's another option to escape newlines.
After invoking vi, and re-evaluating the command line, bash returns to edit-mode. Ksh returns to the default prompt.
Bash specific shell options are set with 'shopt' in contrast to 'set -o ' for traditional longname options. I'm not sure why there's two methods for a similar feature, a good guess is that it prevents collisions with other shell options. It beats the csh method of storing options in shell variables.
Minor nit: after completion, the shell prompt command counter ( ! in posix mode ) is not incremented until an extra newline is entered. The counter then jumps by two.

2007-12-27

Another case for CIFS and ZFS?

Microsoft has released kb/946676, detailing a problem with Windows Home Server shared folders.

When you use certain programs to edit files on a home computer that uses Windows Home Server, the files may become corrupted when you save them to the home server.

The article warns about certain applications that are not supported with shared folders. Users should copy their files to local storage before opening them with any of the suspect applications.

That basically halves the functionality of WHS, which is being touted as a NAS/backup appliance.

I wonder though, whether the problem described here is inherent to the way windows applications use their datafiles. The typical approach I remember is that applications do live updates to the original files, after making a temporary backup file. This in contrast to the traditional unix way of life, where you first create a working copy and when done use that to replace the original.

I really should build my own home NAS based on Solaris/CIFS server and ZFS next year. Let's see what kind of budget I have...

Update: It appears to be a reliability issue under heavy load, and was hard to reproduce. I claim it would not happen with NFS. That's why NFS (and any NAS) write performance can suffer badly, if you don't have the hardware to help it along.

Links:
mswhs.com UK fansite
Computerworld article

2007-12-18

ksh93 performance through builtins - a small example

I recently held a presentation for the Dutch OpenSolaris Users Group about the work of Roland Mainz on integrating ksh93. I focused on the history of unix shells, how ksh93 was accepted in the OpenSolaris project, specifically highlighting the OpenSolaris ARC process.

I did not discuss the relative merits and reasons for getting ksh93 included, but today I'll mention one reason in particular: shell script performance. Ksh93 is faster than any other POSIX conforming shell in executing code, not in the least because many standard unix commands have been included as builtins in the shell, allowing scripts to bypass fork() and exec().

Here is a comparison of two lines of code in ksh93. Their effect is exactly the same: Ten thousand times, they create and remove a unique directory. The difference is that the second time around, the ksh builtins for mkdir and rmdir are used.

glorantha 10 $ time for i in {0..9999}; do /bin/mkdir $i; /bin/rmdir $i; done

real 0m35.63s
user 0m1.44s
sys 0m1.65s
glorantha 11 $ time for i in {0..9999}; do mkdir $i; rmdir $i; done

real 0m1.44s
user 0m0.33s
sys 0m1.09s

And this explains why a builtin matters. Both the internal and external command are just as efficient if they're only invoked once. When you have to invoke an external command inside an inner loop, the unix fork()/exec() overhead add up.

But there are other ways of improving the above bit of code, if you're satisfied with a slightly different sequence of actions, and non-POSIX code due to the compact {start .. end } notation.

What I like about this code is that it's compact, and quite clear in its intention.

glorantha 12 $ time mkdir {0..9999} && rmdir {0..9999}

real 0m0.53s
user 0m0.02s
sys 0m0.51s
glorantha 13 $ time /bin/mkdir {0..9999} && /bin/rmdir {0..9999}

real 0m0.53s
user 0m0.03s
sys 0m0.48s

More later

2007-12-05

Document the why in your code

This article in ACM Queue reminded me of the most valuable
argument I once heard in favor of code comments.

You don't explain what the code does, you tell why it does it.

I'm all for the concept of self-documenting code, and except
for the obvious iterator i and counter n, I do try to find descriptive
names for my data structures.

The strongest argument against comments is that they can be wrong.

There's nothing as frustrating as a comment that contradicts the code.
Especially when you believe it.

So yes, comments are good, but let them tell why.

2007-11-22

E8 - Lisi's Theory of Everything

Noticed this thing in New Scientist this morning.

A fitting universal theory of everything, with no strings attached,
except for a couple of puns.

An_Exceptionally_Simple_Theory_of_Everything

26400 hits and counting... 'beowulf uncanny-valley'

I saw the trailers for beowulf yesterday, and the first thing that sprang to mind was 'uncanny valley'.

The characters as a whole look larger than life, but all personality has been removed from their expression.

A real waste of talent.

I'm far from the first to realise this, as a quick search shows.

26,400 and counting, I'm sure...

2007-09-26

party time? IPv6 is done!

I found this in my mailbox yesterday, Sep 25. It's been a long time.
This message somehow touched me. People have been working on
ipng/ipv6 for over twelve years, and will continue to do so.

But how long before people will start using ipv6 in greater numbers?
What will be the killer app for ipv6? P2p? Games? HPC? Social Networking?

High Performance Computing will use ipv6, just because it provides ways of
handling massive amounts of data. No window scaling or other hacks required.
But HPC is not for the masses.

The killer app will be used by millions at a time. And it will be big enough to let
people want to make the jump, and join their friends. But it will only happen
when enough providers actually offer the technology for end-users.

But why would end-users want ipv6? To give up their little NAT router that incidently
protects them against prying eyes?

Or maybe that's it - give every internet subscriber a billion ip addresses and see how
ports canners deal with that. Hide your little pc in a sea of duds. Yeah, as if people
won't just choose the very first address out of the pool - most will actually let dhcp
do that for them.

Not that it matters, as port scanners are not where the risk is. Most malware on the
internet gets transported through human vectors anyway. 'Click here'. No, that's not a link.
I don't even pretend it's one.

Whatever it will be, it should require no more than one or two links to follow in order
to get connected to the internet mark 2, with glorious 128 bit addressing.

Cheers, mine's a half

Henk

To: ietf-announce at ietf.org
Subject: WG Action: Conclusion of IP Version 6 (ipv6)
From: IESG Secretary <iesg-secretary at ietf.org>
Date: Tue, 25 Sep 2007 14:30:02 -0400

The IP Version 6 Working Group (ipv6) in the Internet Area has concluded.

The IESG contact persons are Jari Arkko and Mark Townsley.

+++

A new Working Group, 6MAN, has been created to deal
with maintenance issues arising in IPv6 specifications.
The IPv6 WG is closed. This is an important milestone
for IPv6, marking the official closing of the IPv6
development effort.

The ADs would like to thank everyone -- chairs, authors,
editors, contributors -- who has been involved in the effort
over the years. The IPv6 working group and its predecessor,
IPNGWG, produced 79 RFCs (including 5 in the RFC queue).

Issues relating to IPv6 should in the future be taken up in
6MAN if they relate to problems discovered during
implementation or deployment; V6OPS if they relate to
operational issues; BOF proposals, individual submissions
etc. for new functionality.

The mailing list of the IPv6 WG stays alive; the list will
still be used by the 6MAN WG in order to avoid people
having to resubscribe and/or adjust their mail filters.

--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6 at ietf.org


Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

2007-08-16

Windows Is Free...

... at least people behave like it is. Sharing is seen as a social lubricant, if not a requirement to belong.

read more | digg story

2007-03-08

The memory could not be "read" ...

Windows keeps killing apps with this message:

The instruction at '...' referenced memory at '...'.
The memory could not be read.
Click OK to terminate the program.

After reading the MS KB article, I realised this is not such a strange message after all.
It's actually a very familiar old friend

2007-03-01

no trains this morning due to bad disk?

Just heard on the radio that the failure of the railway systems around Amsterdam this morning had been caused by a bad disk...

Curious.

2007-02-18

Friends of Jim Gray suspend search

Jim Gray is a pioneer in database research, and contributed greatly to the concept of transactions with roll-back and roll-forward, all very commonplace nowadays. Late january I posted about his test of Sun's thumper, apparently on the same day he went out sailing solo out of San Fransico Bay into the Pacific. He's been lost since. All efforts (and they've involved some very high-tech methods) so far have not resulted in any trace of Jim or his vessel, the Tenacious.

In the last several days, the Friends of Jim group has reviewed all the data with Coast Guard officials. The fact is that we have no evidence as to what has happened to Tenacious or to Jim Gray. Neither we nor the Coast Guard can come up with a surface search plan that is likely to find either Tenacious or Jim, given everything that has been done already.
Accordingly, the Friends of Jim group is suspending its active effort to find Tenacious that has been centered here at the blog. For both the Coast Guard and the Friends of Jim, “suspension” means that the active search has been discontinued due to exhausting all present leads and the lack of new information. Of course, should we or the Coast Guard receive any new information, we will investigate it.

2007-02-17

Opensolaris zfs + dtrace guides available in pt_BR translation

A few opensolaris enthousiasts have translated the guides to dtrace and zfs
into Brasilian Portuguese.

OpenSolaris i18n Forums: pt_BR translation zfs + dtrace guides available for review

I'm pleased to announce that the Brazilian Portuguese SGML & PDF version of the following book are now available in the Download Center:

Solaris Dynamic Tracing Guide ( Solaris 10 3/05 : SGML & PDF)
Solaris ZFS Administration Guide ( Solaris 10 11/06 : SGML & PDF)

Note to self: I need to share this with our DBA's...

2007-02-11

HP announces support for Solaris-10 on Xeon

The Unix Guardian says: HP announces support for Solaris 10 on Xeon. The reasoning being that Sun has no Xeon h/w of its own at this time. Interesting, as in my job the x86/x64 server h/w is all HP, running either Windows or Linux. Our ERP systems all run on SPARC/Solaris, though.

And the same article mentions Transitive chalked up HP for its h/w emulation technology, adding them to the matrix of multi-platform cross-overs.

2007-01-28

A difference in mindset

Some of my favourite literature are unix man pages. When I got started on unix, the system (a 3B2 w/ SVR2.0 IIRC) we used at the vakgroep AIV only had printed documentation. But it had all of its man pages in neat little red binders. The format fit exactly the way I absorb information, a quick overview, followed by a linear list of features.

Commands or options listed on the left hand side, with their explanation on the right. As a quick reader, I've never had any trouble distilling the information given in a man page. If I don't understand something, I just keep on going until I either find an explanation or get lost entirely. No problem. A second reading will often put things in a different perspective. And a third or fourth careful reading may sometimes clear up some assumptions or delusions as well.

I read the man page for sh(1), ksh(1) and ksh93(1) at least once a year. And I learn from it. I know most of the features supported by the original Bourne shell, plus the differences with the non-existing POSIX shell, Sun's ksh and AT&T's ksh93. I must admit I'm not confident with most bashisms, though.

The funny thing is that, until I encountered the dummy guides, I had never considered that some people do not like to read through a full description of all the features of a piece of equipment or a piece of software. It came as quite a revelation to me that most people, in fact, do not want to understand things, they just want them to work.

So the type of documentation a geek like me prefers is more like:

option x will do ...
option y does ...

etc., while a non-technical person will prefer text like:

To perform action X, select/press/dial ...
If you want to ..., do ...

The difference is like the P/J difference in the Meyers-Briggs typology. An old style hacker/nerd prefers to be given a set of options to explore (P), where another would prefer a more goal-oriented(J) approach.

I'm not saying one approach is better than another. I'm too much of an observer for that. I like to watch and see how people differ, than take a position on one side or another.

Let us be different.

Jim Gray tests Thumper

Jim Gray has tested Thumper on windows server 2003 and ntfs, in collaboration with Johns Hopkins U.
The summary is very positive:

This is the fastest Intel/AMD system we have ever benchmarked. The 6+ GB/s memory system (4.5GB/s copy) is very promising. Not reported here, but very promising is that we repeated most of the SkyServer Query log analysis on this system – performance was 3x to 100x what we experienced on previous systems – largely due to the 64-bit SQL and to the 16GB of RAM. We hope to report the SkyServer query results soon.

Nice box,Thumper
(fixed typo)

More Adaptive Replacement Cache Algorithm

Based on a conversation during the recent nlosug meeting, I've updated the wikipedia article for the ARC with a better explanation of the algorithm. The language is now more tangible, and the terms used are closer to the original literature.

2007-01-20

Adaptive Replacement Cache in ZFS

Last week, I could not reach the OpenSolaris source browser. I was looking for an explanation of what is called the 'ARC', or Adaptive Replacement Cache, in ZFS.

In contrast to the venerable UFS and NFS, ZFS does not use the normal solaris VM subsystem for its page cache. Instead, pages are mapped into the kernel address space, and managed by the ARC.

Looking through the zfs-discuss archives, I did not find any explanation of the ARC, except for references to the solaris Architecture Council, which is useful enough in itself, but does not deal specifically with paging algoritms...

Googling around, I finally found some useful references: Roch Bourbonnais explains the acronym, and refer to the IBM Almaden research lab, where the Adaptive Replacement Cache algorithm was developed.

In the original IBM version, it uses a cache directory twice as large as needed for the cache size. The extra space is used to keep track of recently evicted entries, so we know if a cache miss actually refers to
a recently used page or not.

After I created the wiki entry I came up with this visualisation of the cache directory:

. . . [1 hit,evicted <-[1 hit, in cache <-|-> 2 hits, in cache]-> 2 hits, evicted] . . .

and the following for a modification in Solaris ZFS, which knows in advance that it
should not throw out certain pages:

. . . [1 hit,evicted <-[1 hit, in cache <-|non-evictable|-> 2 hits, in cache]-> 2 hits, evicted] . . .

The inner brackets represent the actual cache, while the outer brackets show the virtual directory, referring to evicted entries. The total size for the cache is of course fixed, but it moves freely between the outer brackets. In addition, the divider in the middle can also move around, favouring recent or frequent hits.

Because the cache is mapped into kernel memory, this puts quite some stress on 32bit (x86) systems, as the 4GB address space on that architecture is shared by kernel and user space. Space used by the cache limits the size of user processes. Don't run your DBMS on one of these.

Links:
Wikipedia: Adaptive_Replacement_Cache

bits, bytes, packets, scripts...