March 2008 – Andrew Birkett's blog

Haskell is possibly too lazy for me

This is the first of several posts on the topic of Haskell’s laziness. After several weeks of playing, I’m coming to the conclusion that laziness-by-default is a hinderance rather than a virtue. Let’s start at the start though by trying to add some numbers together.

-- Non tail recursive; 5Mb of live objects at end.
mysum []     = 0
mysum (x:xs) = x + mysum xs
main = putStrLn $ show $ mysum $ take 100000 $ [1..]

As the comment says, this is a dumb version. It consumes 5Mb of memory because it’s not tail recursive.

Incidentally, after causing my machine to thrash several time during my experiments, I found it useful to use ‘ulimit’ to restrict the maximum heap size available to the process. Also, you can pass extra args to your haskell app to get it to report real-time memory stats, like this:

ghc sum.hs && /bin/bash -c 'ulimit -Sv 100000; ./a.out +RTS  -Sstderr'

Anyhow, the memory blowup is easy to fix; just pass an ‘accumulator’ parameter when you do the recursive call:

-- Tail recursive, but 3.5Mb of live objects at end.
mysuma acc []     = acc
mysuma acc (x:xs) = mysuma (acc+x) xs
main = putStrLn $ show $ mysuma 0 $ take 100000 $ [1..]

Hmm, it’s now tail recursive but it still consumes 3.5Mb? This is where Haskell’s laziness makes things quite different from ocaml and other strict languages. When we pass the accumulated value, haskell does not actually evaluate the addition prior to making the recursive call. It will delay the computation until its value is actually required. So, on each recursive call, the accumulator looks like an unevaluated “1+2” and then “1+2+3” etc.

We can fix this by explicitly telling haskell to evaluate the addition prior to making the call:

-- Tail recursive, with 'seq' to force immediate evaluation of addition. 
-- 40k of live objects at end.
mysumas acc []     = acc
mysumas acc (x:xs) = (acc+x) `seq` mysumas (acc+x) xs
main = putStrLn $ show $ mysumas 0 $ take 100000 $ [1..]

Finally we have a program which only consumes a tiny amount of heap space. But it took a surprising amount of effort. There’s lots more information about this situation on the haskell wiki.

# Creating and destroying disks from the safety of your own console mkdir ~/raid; cd ~/raid # Create two 10Mb files called disk0 and disk1 for d in 0 1; do dd if=/dev/zero of=disk${d} bs=1024 count=10240; done # Make them available as block devices using the loopback device for d in 0 1; do sudo losetup /dev/loop$d disk$d; done # Combine the two 'disks' into a RAID-1 mirrored block device # Using '--build' rather than '--create' means there is no device # specific metadata, and so the contents of the disks will be identical sudo mdadm --build --verbose /dev/md0 --level=1 --raid-devices=2 /dev/loop[01] # Create a filesystem on our raid device and mount it sudo mkfs.ext3 /dev/md0 mkdir /tmp/raidmnt sudo mount /dev/md0 /tmp/raidmnt sudo chown $USER /tmp/raidmnt # The contents of both disks change in unison md5sum disk[01] date > /tmp/raidmnt/datefile sync md5sum disk[01] # If we mark one disk as failed, disk contents diverge sudo mdadm --fail /dev/md0 /dev/loop0 date > /tmp/raidmnt/datefile sync md5sum disk[01] # Remove the failed disk and readd it, and RAID1 will sync sudo mdadm --remove /dev/md0 /dev/loop0 sudo mdadm --add /dev/md0 /dev/loop0 sleep 1 md5sum disk[01] # Add a third (unused) disk into the system to test failover dd if=/dev/zero of=disk2 bs=1024 count=10240 sudo losetup /dev/loop2 disk2 sudo mdadm --add /dev/md0 /dev/loop2 sudo mdadm --detail /dev/md0 # When one of original two disks fail, the new disk gets used md5sum disk[012] sudo mdadm --fail /dev/md0 /dev/loop0 date > /tmp/raidmnt/datefile sync md5sum disk[012] # Tidy up the world sudo umount /dev/md0 sudo mdadm -S /dev/md0 for x in /dev/loop[012]; do sudo losetup -d $x; done rm -rf /tmp/raidmnt ~/raid