They’re in the trees

Every so often, I learn a “life lesson”.

In the middle of May, I met up with the usual suspects to go camping. When I say camping, I really mean drinking beer and burning stuff. Lots of fun. Anyway, that was the middle of May. We’re now past the middle of June. This is a significant time delay.

At one point, whilst wood was being chopped I had the great idea of chopping off a chunk of tree and taking it home to make something on my woodlathe of sorts. I lopped off a nice 4″ long slice of birch wood, and figured I could dry it out and make a bowl out of it. It went into my rucksack before the ride home, and was then dumped in the corner of my living room. Eventually, I planned to spin it around at high speed and wave sharp tools at it, but I left it in the sun so it dried out too fast and cracked.

On Friday morning, just before I went to work, I noticed there was a funny cracking sound coming from the wood. After prodding it, it seemed like the bark was coming away from wood. No great surprise, I guess .. the wood was probably just drying out some more and shrinking away from the bark. I also noticed a single small perfectly circular hole in the wood, which I’d never seen before. I prodded a wire in the hole to see if there was anything in it, but nothing exciting happened so I left the wood in the corner of the room and went to work.

Then when I got home: Buzzing sound near the window. Biggest FOAD insect I’ve ever seen in Scotland clambering nervously over the blinds. We don’t get many big insects here – I guess it’s too cold – but this was a bug of significant proportions. Eep! I carefully removed the Bug to the garden. Then I come back inside and looked at the bit of wood.

There are now three holes in the wood.

Jeez, so let’s assume the original hole had been there for a while and Bug #1 was long gone. Bug number #2 had just been relocated to the garden. So that meant Bug #3 had moved out of it’s Birch home and was now living it up in my flat somewhere. And Susan really doesn’t like bugs. “Hi honey, remember that lovely bit of wood I brought back from camping?”. Cue a frantic bug-hunt.

I found it eventually, huddled in the corner. It was the same flavour as bug #2, but a bit more energetic. I guess nature didn’t prepare them for bursting out of a tree into a double-glazed flat.

That bit of wood had been sitting quietly in the corner of my living room for well over a month. All that time, it had been playing host to two little maggots which grew bigger and bigger, until one day the wood started making cracking sounds and two big bugs ate their way out. It’s like a plot for a horror movie.

Won’t be doing that again.

s/MoveableType/WordPress/

In the end, I took the plunge and moved my blog from MoveableType to WordPress. Links to existing blog entries should still work, due to the wonders of mod_rewrite, and the RSS feed has full content. First impressions are very positive – wordpress runs a lot faster and is a lot easier to tweak.

Ocaml assembly output

Having spend a lot of time recently looking at the assembly generated by the MS C++ compiler and Corman lisp, I was about to start investigating what style of assembly the ocaml compiler generates for typical constructs. Fortunately for me, some else has already got there first. It’s a pretty informative article, and goes a long way towards explaining why ocaml performs so well. Well, actually, the real reason is that Xavier is one clever cookie.

I read one of his early reports on the ZINC system when I was travelling around the world with iPAQ in tow, and the major performance gain at that point, if I remember correctly, concerned avoiding unnecessary construction of closures for curried function. So, even though function calls in ocaml are curried (ie. a “two argument function” is really a single argument function, which returns another single argument function) you don’t actually need to build up all the scaffholding for the intermediate function if you’re just going to immediately apply it to another value. This stops you building lots of intermediate closures on the heap. This was an innovation at the time, I imagine (in 1990).

The article also describes the boxing scheme used in ocaml, which uses the bottom bit to indicate whether a bit of data is an integer, or a pointer to a heap block. If all your heap blocks are word aligned, the bottom bit is redundant anyway so this is a neat efficient trick. [I know a few other unnamed people who should recognise such bittwiddling tomfoolery too 😉 ]

Last week at work, I started rewriting a small bit of code because I knew that there was a more elegant (and therefore more likely to be correct and maintainable) way to express what it was doing. Unfortunately, when I started editing the code I realised that I was thinking in ocaml! The “elegant way” required ocaml-style variants and pattern matching, but I was coding in C++. Eep! The closest C++ equivalent is almost a joke in constrast to the ocaml version.

Finally, to end this ocaml praise session, the internals of the compiler are elegant and clean. The various sections of the compiler are split out using ocaml’s powerful module facilities, and the functional style of programming (ie. minimal use of state) makes understanding the code a lot easier. By contrast, the internals of gcc are a hideous mess. Actually no, the internals of gcc *are* a hideous mess, period.

Mu, I’m somewhat paralysed with indecision regarding where I want to go with computer tools. I have so many ideas and things I want to try out, and I’ve seen so many great ideas consigned to the historical bit bucket. And also, despite the impression which this programming-only-blog might convey, there’s a million and one other non-computer things which I want to spend my time on. I think at some point I need to lower my ambition and focus on improving one particular thing, rather than riding along on a cascade of new ideas. But it’s annoying, because every day I see tools which I consider to be primitive and backwards compared to what is possible. I want to “do an Alan Kay” and burn the disks – look around, take what is good, and throw the rest away.

When I’m 64

I’ve been taking lots of digital photographs recently, and I recently worried whether I’d still be able to view them in thirty or so years. I have several old documents in Pagemaker 4 format which I know I’ll never be able to read again. So, will this happen with my photos? Are my jpegs future-proofed?

I’m not worried about the physical media becoming obselete. Our abililty to store data has constantly increased. Every bit of data I have is kept on my harddrive. When I change machine then I copy it wholesale onto the new machine. I never “archive off” old material onto tape/CD/DVD to free up space, because my hard drive is always larger than my storage needs. I’m fairly confident that in 30 years time, I will still be able to access the raw sequence of bits which make up each photograph.

Will I be able to view these bits as photographs though? In the year 2033, I could probably emulate today’s hardware/software and still run exactly the same utility to view my photos. It’s a bit of a heavyweight solution though. I don’t really want to snapshot the current version of the linux kernel, XFree86, Gnome and GThumb just so I can view photos sometime in the future.

It seems more sensible to stash away a description of the jpeg file format – that way, even if noone else wants to view jpegs, I can still code up a viewer because I know what the sequence of bits means.

But how should I do that? Storing the source code of a C++ or Java JPEG viewer isn’t going to be much use, because In The Future it’ll be pretty hard to figure what the semantics of C++ or Java were in the year 2003. I’d have to stash away a copy of the Java Language Spec too, otherwise I’d just be left with a pile of meaningless squiggly brackets. We learned that when people had to tackle a myriad of COBOL dialects for Y2K problems.

Is there any “timeless” programming language which I could use? Something which I’ll still know the semantics for in 30 years? Hmm, I guess you could express a JPEG decoder using the lambda calculus, but that’s a bit extreme. The semantics of the lambda calculus will still be understood in 30 years, but you’d have a hard time figuring out what a large “lambda calculus JPEG decoder” actually did.

I think, if I’m going to stash away a description of the JPEG algorithm, it’s probably going to be a good old fashioned english-language description. That’s probably good enough for 30 years. I don’t think the semantics of the English language and mathematical notation are going to change much in that time. It’s not perfect – just look at the ambiguities which most “english language” specifications contain. But, the language .. the medium in which the description is expressed .. is probably stable for a good few decades. Maybe this Haskell paper is a good alternative to the published standards document.

I wish the story ended at this point. But it doesn’t! How should I store the description of the algorithm? I don’t expect I’ll be able to read PDF documents in the year 2033. Their semantics are way complex, more so than the JPEG format. You’d have to record the semantics of Truetype fonts, or whatever, in order to display them in the future. Postscript is no better. The Postscript reference manual which gives an english-language description of the format/language is over an inch thick.

ASCII or UTF-8 is going to be a good bet. I think we’ll still be able to read that in the future. I probably want to include some mathematical formulae, so maybe I need some mathematical markup too. It begins to sound like XML.

So that’s the next 30 years sorted out. What I wanted my data to last for 3000 years? Can you store information which transcends changes in language, notation and cultures?

LL2

I’m working my way through the videos from the Lightweight Languages 2002 (LL2) conference at MIT. Yay, for digital videos of conferences! Boo, for doing it in RealVideo format. This isn’t just because I can’t archive it locally, but also because there isn’t terribly much action to watch. Most of the time, the camera focuses on the presenter’s slides, which appear blurry and pixelated after the RealVideo codec has squished them. I look forward to the day when SMIL or some other integrated multimedia solution allows us to mix audio/text/html/pdf/video in a syncronized stream. It would be a great boost for online learning.

Anyhow, I’ve just watched Matthew Flatt’s presentation in which he plays with the distinction between programming languages and operating systems. He makes the point that safe languages (such as ml, where the type system guarantees that you won’t suffer pointer errors) remove at least some of the reasons for having processes occupying seperate address spaces. Unlike in C++, an ML program will never scribble over the end of a heap block. So, if your citizens are well enough behaved, they can all live in the one house together.

Hmm, must leave to get to band practise now …