Ocaml assembly output

Having spend a lot of time recently looking at the assembly generated by the MS C++ compiler and Corman lisp, I was about to start investigating what style of assembly the ocaml compiler generates for typical constructs. Fortunately for me, some else has already got there first. It’s a pretty informative article, and goes a long way towards explaining why ocaml performs so well. Well, actually, the real reason is that Xavier is one clever cookie.

I read one of his early reports on the ZINC system when I was travelling around the world with iPAQ in tow, and the major performance gain at that point, if I remember correctly, concerned avoiding unnecessary construction of closures for curried function. So, even though function calls in ocaml are curried (ie. a “two argument function” is really a single argument function, which returns another single argument function) you don’t actually need to build up all the scaffholding for the intermediate function if you’re just going to immediately apply it to another value. This stops you building lots of intermediate closures on the heap. This was an innovation at the time, I imagine (in 1990).

The article also describes the boxing scheme used in ocaml, which uses the bottom bit to indicate whether a bit of data is an integer, or a pointer to a heap block. If all your heap blocks are word aligned, the bottom bit is redundant anyway so this is a neat efficient trick. [I know a few other unnamed people who should recognise such bittwiddling tomfoolery too 😉 ]

Last week at work, I started rewriting a small bit of code because I knew that there was a more elegant (and therefore more likely to be correct and maintainable) way to express what it was doing. Unfortunately, when I started editing the code I realised that I was thinking in ocaml! The “elegant way” required ocaml-style variants and pattern matching, but I was coding in C++. Eep! The closest C++ equivalent is almost a joke in constrast to the ocaml version.

Finally, to end this ocaml praise session, the internals of the compiler are elegant and clean. The various sections of the compiler are split out using ocaml’s powerful module facilities, and the functional style of programming (ie. minimal use of state) makes understanding the code a lot easier. By contrast, the internals of gcc are a hideous mess. Actually no, the internals of gcc *are* a hideous mess, period.

Mu, I’m somewhat paralysed with indecision regarding where I want to go with computer tools. I have so many ideas and things I want to try out, and I’ve seen so many great ideas consigned to the historical bit bucket. And also, despite the impression which this programming-only-blog might convey, there’s a million and one other non-computer things which I want to spend my time on. I think at some point I need to lower my ambition and focus on improving one particular thing, rather than riding along on a cascade of new ideas. But it’s annoying, because every day I see tools which I consider to be primitive and backwards compared to what is possible. I want to “do an Alan Kay” and burn the disks – look around, take what is good, and throw the rest away.


When I’m 64

I’ve been taking lots of digital photographs recently, and I recently worried whether I’d still be able to view them in thirty or so years. I have several old documents in Pagemaker 4 format which I know I’ll never be able to read again. So, will this happen with my photos? Are my jpegs future-proofed?

I’m not worried about the physical media becoming obselete. Our abililty to store data has constantly increased. Every bit of data I have is kept on my harddrive. When I change machine then I copy it wholesale onto the new machine. I never “archive off” old material onto tape/CD/DVD to free up space, because my hard drive is always larger than my storage needs. I’m fairly confident that in 30 years time, I will still be able to access the raw sequence of bits which make up each photograph.

Will I be able to view these bits as photographs though? In the year 2033, I could probably emulate today’s hardware/software and still run exactly the same utility to view my photos. It’s a bit of a heavyweight solution though. I don’t really want to snapshot the current version of the linux kernel, XFree86, Gnome and GThumb just so I can view photos sometime in the future.

It seems more sensible to stash away a description of the jpeg file format – that way, even if noone else wants to view jpegs, I can still code up a viewer because I know what the sequence of bits means.

But how should I do that? Storing the source code of a C++ or Java JPEG viewer isn’t going to be much use, because In The Future it’ll be pretty hard to figure what the semantics of C++ or Java were in the year 2003. I’d have to stash away a copy of the Java Language Spec too, otherwise I’d just be left with a pile of meaningless squiggly brackets. We learned that when people had to tackle a myriad of COBOL dialects for Y2K problems.

Is there any “timeless” programming language which I could use? Something which I’ll still know the semantics for in 30 years? Hmm, I guess you could express a JPEG decoder using the lambda calculus, but that’s a bit extreme. The semantics of the lambda calculus will still be understood in 30 years, but you’d have a hard time figuring out what a large “lambda calculus JPEG decoder” actually did.

I think, if I’m going to stash away a description of the JPEG algorithm, it’s probably going to be a good old fashioned english-language description. That’s probably good enough for 30 years. I don’t think the semantics of the English language and mathematical notation are going to change much in that time. It’s not perfect – just look at the ambiguities which most “english language” specifications contain. But, the language .. the medium in which the description is expressed .. is probably stable for a good few decades. Maybe this Haskell paper is a good alternative to the published standards document.

I wish the story ended at this point. But it doesn’t! How should I store the description of the algorithm? I don’t expect I’ll be able to read PDF documents in the year 2033. Their semantics are way complex, more so than the JPEG format. You’d have to record the semantics of Truetype fonts, or whatever, in order to display them in the future. Postscript is no better. The Postscript reference manual which gives an english-language description of the format/language is over an inch thick.

ASCII or UTF-8 is going to be a good bet. I think we’ll still be able to read that in the future. I probably want to include some mathematical formulae, so maybe I need some mathematical markup too. It begins to sound like XML.

So that’s the next 30 years sorted out. What I wanted my data to last for 3000 years? Can you store information which transcends changes in language, notation and cultures?



I’m working my way through the videos from the Lightweight Languages 2002 (LL2) conference at MIT. Yay, for digital videos of conferences! Boo, for doing it in RealVideo format. This isn’t just because I can’t archive it locally, but also because there isn’t terribly much action to watch. Most of the time, the camera focuses on the presenter’s slides, which appear blurry and pixelated after the RealVideo codec has squished them. I look forward to the day when SMIL or some other integrated multimedia solution allows us to mix audio/text/html/pdf/video in a syncronized stream. It would be a great boost for online learning.

Anyhow, I’ve just watched Matthew Flatt’s presentation in which he plays with the distinction between programming languages and operating systems. He makes the point that safe languages (such as ml, where the type system guarantees that you won’t suffer pointer errors) remove at least some of the reasons for having processes occupying seperate address spaces. Unlike in C++, an ML program will never scribble over the end of a heap block. So, if your citizens are well enough behaved, they can all live in the one house together.

Hmm, must leave to get to band practise now …


Hardware, for once

A few years ago, I dabbled in hardware by building a programmer and test circuit using the PIC microcontroller. At the time, I wished that I had a digital camera so that I could put up a webpage. Well, four years have passed and I finally have a digital camera, so here it is.

I think building these circuits exorcised my hardware daemons. I feel no further need to build “little computers” (which was what Susan’s mum would call it). Normal service will be resumed shortly.



Hello, world! After chatting with Anthony last night, I decided that I would switch blog software to one which supports backtrack. Sites like make it easy to crosslink between blogs on the one site, but the internet is a big place so I want to use a system which allows references between blogs regardless of where they live.

Up until now, I have been using BlogMax but I have now started using Movable Type. Time to convert my old entries to the new site ..