Netscape; hindsight is foresight

May 10th, 2008

I have been enjoying reading “Architects of the Web” (see it on my Amazon bookshelf), a collection of stories from the early days of Netscape, Yahoo and the like. Perhaps in an attempt to avoid the doom of repetition, I’ve been reading a lot of “software history” recently … Seattle Public Library has got plenty of cool books.

Chapter one follows the founding of Netscape, from the early days of NCSA Mosaic, the fortuitous meeting of Marc Andreessen and Jim Clark and the beginning of the browser wars. I remember this from first time around, but I didn’t really understand all of what was going on.

The book progresses to follow the start of the browser wars, AOL beginning to bundle IE, Netscape launching the communicator suite …

And then the Netscape part of the book ends.

What? The end? But what about the browser wars? Microsoft getting sued by their own government? The AOL buyout? The Time Warner merger? Open sourcing of mozilla? The doldrums of tangled source code? And finally the rise of firefox?

As I flipped back to the opening “acknowledgements” page, I suddenly understand.

It was written in December 1996.

OMG. This book is a history of the web from the world of 1996. They had no idea what was coming next. Napster was nearly three years away. iTunes and the DRM wars would wait another few years beyond that. Skype, blogs, Flickr and web2.0 weren’t even on the radar yet.

But then again, what would happen if I wrote a ‘history of the web’ book today? Twelve years from now, someone might pick it up and say “Wow, these guys had no idea that X, Y and Z were just around the corner”.

I remember during the early days of Napster, I thought “this is basically illegal and will get squished”. But it took me a while to understand that (although Napster itself would ultimately be doomed) a genie had came out from a bottle and wasn’t ever going back in. Napster itself would end up dead, but so would the “old way of thinking”. It maybe took over a decade, but now stores are selling DRM free digital music and making lots of money doing so. People voted with their feet and it’s hard to stop a crowd.

So it occurs to me that in order to have a chance of seeing the new X, Y and Z before they creep over the horizon, you probably want to try letting go some of your ‘immutable assumptions’ about the world, and see what’d change if the assumption didn’t hold any more. Here’s some which pop into my head: ‘you need to have a bank account to put your money into’, ‘computers are not disposable items’, ‘companies need to keep stuff secret from their competitors’. Coincidentally, I’m also reading a book about Einstein’s life (on my bookshelf) and he’s the posterchild for the the “what happens if we ignore this fundamental assumption” school of thought.

So I’m now wondering: which ‘truths’ will have their demise chronicled in the history books of the future?

Haskell is possibly too lazy for me

March 20th, 2008

This is the first of several posts on the topic of Haskell’s laziness. After several weeks of playing, I’m coming to the conclusion that laziness-by-default is a hinderance rather than a virtue. Let’s start at the start though by trying to add some numbers together.

-- Non tail recursive; 5Mb of live objects at end.
mysum []     = 0
mysum (x:xs) = x + mysum xs
main = putStrLn $ show $ mysum $ take 100000 $ [1..]

As the comment says, this is a dumb version. It consumes 5Mb of memory because it’s not tail recursive.

Incidentally, after causing my machine to thrash several time during my experiments, I found it useful to use ‘ulimit’ to restrict the maximum heap size available to the process. Also, you can pass extra args to your haskell app to get it to report real-time memory stats, like this:

ghc sum.hs && /bin/bash -c 'ulimit -Sv 100000; ./a.out +RTS  -Sstderr'

Anyhow, the memory blowup is easy to fix; just pass an ‘accumulator’ parameter when you do the recursive call:

-- Tail recursive, but 3.5Mb of live objects at end.
mysuma acc []     = acc
mysuma acc (x:xs) = mysuma (acc+x) xs
main = putStrLn $ show $ mysuma 0 $ take 100000 $ [1..]

Hmm, it’s now tail recursive but it still consumes 3.5Mb? This is where Haskell’s laziness makes things quite different from ocaml and other strict languages. When we pass the accumulated value, haskell does not actually evaluate the addition prior to making the recursive call. It will delay the computation until its value is actually required. So, on each recursive call, the accumulator looks like an unevaluated “1+2″ and then “1+2+3″ etc.

We can fix this by explicitly telling haskell to evaluate the addition prior to making the call:

-- Tail recursive, with 'seq' to force immediate evaluation of addition. 
-- 40k of live objects at end.
mysumas acc []     = acc
mysumas acc (x:xs) = (acc+x) `seq` mysumas (acc+x) xs
main = putStrLn $ show $ mysumas 0 $ take 100000 $ [1..]

Finally we have a program which only consumes a tiny amount of heap space. But it took a surprising amount of effort. There’s lots more information about this situation on the haskell wiki.

RAID-1 sans disks

March 20th, 2008

In preparation for a somewhat more dramatic future experiment, I’ve been trying out RAID-1 failure modes using linux’s loopback capabilities to avoid having to actually buy any more real hard drives. You can simulate drive failures, failover and easily see what the current disk contents are. It should go without saying that, if you have a real RAID system currently running, you probably don’t want to execute these commands without thinking a bit first:

# Creating and destroying disks from the safety of your own console
mkdir ~/raid; cd ~/raid
 
# Create two 10Mb files called disk0 and disk1
for d in 0 1; do dd if=/dev/zero of=disk${d} bs=1024 count=10240; done
 
# Make them available as block devices using the loopback device
for d in 0 1; do sudo losetup /dev/loop$d disk$d; done
 
# Combine the two 'disks' into a RAID-1 mirrored block device
# Using '--build' rather than '--create' means there is no device
# specific metadata, and so the contents of the disks will be identical
sudo mdadm --build --verbose /dev/md0 --level=1 --raid-devices=2 /dev/loop[01]
 
# Create a filesystem on our raid device and mount it
sudo mkfs.ext3 /dev/md0 
mkdir /tmp/raidmnt
sudo mount /dev/md0 /tmp/raidmnt
sudo chown $USER /tmp/raidmnt
 
# The contents of both disks change in unison
md5sum disk[01]
date > /tmp/raidmnt/datefile
sync
md5sum disk[01]
 
# If we mark one disk as failed, disk contents diverge
sudo mdadm --fail /dev/md0 /dev/loop0
date > /tmp/raidmnt/datefile
sync
md5sum disk[01]
 
# Remove the failed disk and readd it, and RAID1 will sync
sudo mdadm --remove /dev/md0 /dev/loop0
sudo mdadm --add /dev/md0 /dev/loop0
sleep 1
md5sum disk[01]
 
# Add a third (unused) disk into the system to test failover
dd if=/dev/zero of=disk2 bs=1024 count=10240
sudo losetup /dev/loop2 disk2
sudo mdadm --add /dev/md0 /dev/loop2
sudo mdadm --detail /dev/md0
 
# When one of original two disks fail, the new disk gets used
md5sum disk[012]
sudo mdadm --fail /dev/md0 /dev/loop0
date > /tmp/raidmnt/datefile
sync
md5sum disk[012]
 
# Tidy up the world
sudo umount /dev/md0
sudo mdadm -S /dev/md0
for x in /dev/loop[012]; do sudo losetup -d $x; done
rm -rf /tmp/raidmnt ~/raid

Reading mode for emacs

February 24th, 2008

Reading lots of text on a computer isn’t the most fun thing in the world. I still like paper (or e-ink screens!). But computers still have the advantage of flexibility. I’ve tried some “alternative” reading methods like dictator before but didn’t like them much. I think a less radical approach is better. So, I’ve been hacking up a basic reading mode in emacs which just highlights a sentence at a time. I’ve found it pleasantly useful, especially for technical documents. Here’s the screenshot (code still in flux):

Sentence highlighting

Dead simple, but it keeps my mind focused.

Sleight of Haskelly Hand (and The Appearance Of A Process)

February 19th, 2008

Here’s some low-level hackery fun which revealed something I didn’t know about unix until yesterday. Yi (the emacs clone in haskell) currently implements “code updating” by persisting the application state, and calling exec() to replace the program code with the latest version, and then restores the previous state. Lots of applications do this to some extent. However, yi needs to be a bit smart because (like emacs) it can have open network connections and open file handle which also need to survive the restart but aren’t trivially persistable. For example, yi could be running subshells or irc clients.

Fortunately, this is possible! When you call exec(), existing file descriptors remain open. This is very different from starting a new process from scratch. So all we need to do is persist some information about which descriptors were doing which particular job. Then, when we start up again, we can rewire up all our file handles and network connections and carry on as if nothing has happened.

Here’s an example haskell app which shows this in action. First of all we need to import various bits:

import System.Posix.Types
import System.Posix.Process
import System.Posix.IO
import System.IO
import Network.Socket
import System( getArgs, getProgName )
import Foreign.C.Types

Next we have a “main” function which distinguishes between “the first run” and “the second run” (ie. after re-exec’ing) by the presence of command line arguments:

main  :: IO ()
main = do
  args < - getArgs
  case args of
    [] -> firsttime
    [ file_fd, net_fd ] -> reuse (read file_fd) (read net_fd)

The first time we run, we open a network connection to http://example.com and we also open a disk file for writing. We then re-exec the current process to start over again, but also pass the disk file fd as the first command line argument, and the network socket fd as the second argument. Both are just integers:

firsttime :: IO ()
firsttime = do 
  -- Open a file, grab its fd
  Fd file_fd < - handleToFd =<< openFile "/tmp/some-file" WriteMode
 
  -- Open a socket, grab its fd
  socket <- socket AF_INET Stream defaultProtocol 
  addr <- inet_addr "208.77.188.166" -- example.com
  connect socket (SockAddrInet 80 addr)
  send socket "GET / HTTP/1.0\n\n"
  let net_fd = fdSocket socket
 
  -- rexec ourselves
  pn <- getProgName
  putStrLn $ "Now re-execing as " ++ pn ++ " " ++ show file_fd ++ " " ++ show net_fd
  executeFile ("./" ++ pn) False [ show file_fd, show net_fd ] Nothing

The second time we run, we pick up these two file descriptors and proceed to use them. In this code, we read an HTTP response from the network connection and write it to the disk file.

reuse :: CInt -> CInt -> IO ()
reuse file_fd net_fd = do
  putStrLn $ "Hello again, I've been re-execd!"
 
  putStrLn $ "Using fd " ++ show net_fd ++ " as a network connection"
  socket < - mkSocket net_fd AF_INET Stream defaultProtocol Connected
  msg <- recv socket 100
 
  putStrLn $ "Using fd " ++ show file_fd ++ " as an output file"
  h <- fdToHandle (Fd file_fd)
  hPutStrLn h $ "Got this from network: " ++ msg
 
  hClose h
  sClose socket  
 
  putStrLn "Now look in /tmp/some-file"

.. and we end up with the file containing text retrieved from a network connection which was made in a previous life. It is a curious and useful technique. But I find it interesting because it made me realise that I usually think of a “unix process” as being the same thing as “an instance of grep” or “an instance of emacs”. But a process can change its skin many times during its lifetime. It can “become” many different creatures by exec()ing many times, and it can keep the same file descriptors throughout. I’ve only ever seen exec() paired with a fork() call before, but that’s just one way to use it.