When I’m 64

I’ve been taking lots of digital photographs recently, and I recently worried whether I’d still be able to view them in thirty or so years. I have several old documents in Pagemaker 4 format which I know I’ll never be able to read again. So, will this happen with my photos? Are my jpegs future-proofed?

I’m not worried about the physical media becoming obselete. Our abililty to store data has constantly increased. Every bit of data I have is kept on my harddrive. When I change machine then I copy it wholesale onto the new machine. I never “archive off” old material onto tape/CD/DVD to free up space, because my hard drive is always larger than my storage needs. I’m fairly confident that in 30 years time, I will still be able to access the raw sequence of bits which make up each photograph.

Will I be able to view these bits as photographs though? In the year 2033, I could probably emulate today’s hardware/software and still run exactly the same utility to view my photos. It’s a bit of a heavyweight solution though. I don’t really want to snapshot the current version of the linux kernel, XFree86, Gnome and GThumb just so I can view photos sometime in the future.

It seems more sensible to stash away a description of the jpeg file format – that way, even if noone else wants to view jpegs, I can still code up a viewer because I know what the sequence of bits means.

But how should I do that? Storing the source code of a C++ or Java JPEG viewer isn’t going to be much use, because In The Future it’ll be pretty hard to figure what the semantics of C++ or Java were in the year 2003. I’d have to stash away a copy of the Java Language Spec too, otherwise I’d just be left with a pile of meaningless squiggly brackets. We learned that when people had to tackle a myriad of COBOL dialects for Y2K problems.

Is there any “timeless” programming language which I could use? Something which I’ll still know the semantics for in 30 years? Hmm, I guess you could express a JPEG decoder using the lambda calculus, but that’s a bit extreme. The semantics of the lambda calculus will still be understood in 30 years, but you’d have a hard time figuring out what a large “lambda calculus JPEG decoder” actually did.

I think, if I’m going to stash away a description of the JPEG algorithm, it’s probably going to be a good old fashioned english-language description. That’s probably good enough for 30 years. I don’t think the semantics of the English language and mathematical notation are going to change much in that time. It’s not perfect – just look at the ambiguities which most “english language” specifications contain. But, the language .. the medium in which the description is expressed .. is probably stable for a good few decades. Maybe this Haskell paper is a good alternative to the published standards document.

I wish the story ended at this point. But it doesn’t! How should I store the description of the algorithm? I don’t expect I’ll be able to read PDF documents in the year 2033. Their semantics are way complex, more so than the JPEG format. You’d have to record the semantics of Truetype fonts, or whatever, in order to display them in the future. Postscript is no better. The Postscript reference manual which gives an english-language description of the format/language is over an inch thick.

ASCII or UTF-8 is going to be a good bet. I think we’ll still be able to read that in the future. I probably want to include some mathematical formulae, so maybe I need some mathematical markup too. It begins to sound like XML.

So that’s the next 30 years sorted out. What I wanted my data to last for 3000 years? Can you store information which transcends changes in language, notation and cultures?

LL2

I’m working my way through the videos from the Lightweight Languages 2002 (LL2) conference at MIT. Yay, for digital videos of conferences! Boo, for doing it in RealVideo format. This isn’t just because I can’t archive it locally, but also because there isn’t terribly much action to watch. Most of the time, the camera focuses on the presenter’s slides, which appear blurry and pixelated after the RealVideo codec has squished them. I look forward to the day when SMIL or some other integrated multimedia solution allows us to mix audio/text/html/pdf/video in a syncronized stream. It would be a great boost for online learning.

Anyhow, I’ve just watched Matthew Flatt’s presentation in which he plays with the distinction between programming languages and operating systems. He makes the point that safe languages (such as ml, where the type system guarantees that you won’t suffer pointer errors) remove at least some of the reasons for having processes occupying seperate address spaces. Unlike in C++, an ML program will never scribble over the end of a heap block. So, if your citizens are well enough behaved, they can all live in the one house together.

Hmm, must leave to get to band practise now …

Hardware, for once

A few years ago, I dabbled in hardware by building a programmer and test circuit using the PIC microcontroller. At the time, I wished that I had a digital camera so that I could put up a webpage. Well, four years have passed and I finally have a digital camera, so here it is.

I think building these circuits exorcised my hardware daemons. I feel no further need to build “little computers” (which was what Susan’s mum would call it). Normal service will be resumed shortly.

*ping*

Hello, world! After chatting with Anthony last night, I decided that I would switch blog software to one which supports backtrack. Sites like www.livejournal.com make it easy to crosslink between blogs on the one site, but the internet is a big place so I want to use a system which allows references between blogs regardless of where they live.

Up until now, I have been using BlogMax but I have now started using Movable Type. Time to convert my old entries to the new site ..