When I’m 64

I’ve been taking lots of digital photographs recently, and I recently worried whether I’d still be able to view them in thirty or so years. I have several old documents in Pagemaker 4 format which I know I’ll never be able to read again. So, will this happen with my photos? Are my jpegs future-proofed?

I’m not worried about the physical media becoming obselete. Our abililty to store data has constantly increased. Every bit of data I have is kept on my harddrive. When I change machine then I copy it wholesale onto the new machine. I never “archive off” old material onto tape/CD/DVD to free up space, because my hard drive is always larger than my storage needs. I’m fairly confident that in 30 years time, I will still be able to access the raw sequence of bits which make up each photograph.

Will I be able to view these bits as photographs though? In the year 2033, I could probably emulate today’s hardware/software and still run exactly the same utility to view my photos. It’s a bit of a heavyweight solution though. I don’t really want to snapshot the current version of the linux kernel, XFree86, Gnome and GThumb just so I can view photos sometime in the future.

It seems more sensible to stash away a description of the jpeg file format – that way, even if noone else wants to view jpegs, I can still code up a viewer because I know what the sequence of bits means.

But how should I do that? Storing the source code of a C++ or Java JPEG viewer isn’t going to be much use, because In The Future it’ll be pretty hard to figure what the semantics of C++ or Java were in the year 2003. I’d have to stash away a copy of the Java Language Spec too, otherwise I’d just be left with a pile of meaningless squiggly brackets. We learned that when people had to tackle a myriad of COBOL dialects for Y2K problems.

Is there any “timeless” programming language which I could use? Something which I’ll still know the semantics for in 30 years? Hmm, I guess you could express a JPEG decoder using the lambda calculus, but that’s a bit extreme. The semantics of the lambda calculus will still be understood in 30 years, but you’d have a hard time figuring out what a large “lambda calculus JPEG decoder” actually did.

I think, if I’m going to stash away a description of the JPEG algorithm, it’s probably going to be a good old fashioned english-language description. That’s probably good enough for 30 years. I don’t think the semantics of the English language and mathematical notation are going to change much in that time. It’s not perfect – just look at the ambiguities which most “english language” specifications contain. But, the language .. the medium in which the description is expressed .. is probably stable for a good few decades. Maybe this Haskell paper is a good alternative to the published standards document.

I wish the story ended at this point. But it doesn’t! How should I store the description of the algorithm? I don’t expect I’ll be able to read PDF documents in the year 2033. Their semantics are way complex, more so than the JPEG format. You’d have to record the semantics of Truetype fonts, or whatever, in order to display them in the future. Postscript is no better. The Postscript reference manual which gives an english-language description of the format/language is over an inch thick.

ASCII or UTF-8 is going to be a good bet. I think we’ll still be able to read that in the future. I probably want to include some mathematical formulae, so maybe I need some mathematical markup too. It begins to sound like XML.

So that’s the next 30 years sorted out. What I wanted my data to last for 3000 years? Can you store information which transcends changes in language, notation and cultures?

New Job!

I’ll shortly be leaving Voxar, after five years there. In September I’ll be starting at Ergnosis. They’re a Bristol-based company, but I’ll still be living in Edinburgh and will be telecommuting most of the time. I’m very excited to be joining them, although I’m a bit nervous to see how I deal with home-working over a long period of time. This means that I’ll finally be spending my paid-work hours on making better development tools! Joy! 🙂

This happy event is connected to the blogging world in two ways. Firstly, I initially discovered the company via James Robertson’s Smalltalk blog. Secondly, my own blog streamlined the interview process. It let the guys at Ergnosis see what kind of stuff I’m interested in, and get an idea of my views on life and software tools. When I started writing this blog, my intention was to capture some notes on the stuff I’m thinking about. I was also aware that it could act as a sort of “professional biography”. I had no idea that, three months after starting writing it, I’d be switching to a new job!

Scribbling in the margins

Reference books, like “Java in a Nutshell” are so yesterday! We should be downloading secure digital content which augments our development environment. Most “nutshell”-like books are just annotated reference manuals some comments and examples in additional to the raw reference material. Well, I’d like to have my reference material hyperlinked and embedded in my development environment so it’s available in context, and therefore I’d also like the additional commentary to be embedded in my development environment too. Sun might supply the raw API documentation, and then third-parties like O’reilly could supply useful annotations. But let’s not have it on paper …

Now I want electronic margins to scribble in. Why is API documentation usually treated as read-only? I’m much more efficient when I’m using a highly personalized development environment. But most development environments won’t let me scribble on the API docs. It’s like the system assumes that the docs are the pinnacle of perfection and therefore don’t need to be altered. I can imagine that the margin notes from an experienced programmer would be worth their weight in gold.

Exploring the blogsphere

James Robertson blogs about the way which his RSS reader has changed the way he consumes information. I’ve found exactly the same thing over the past few months. I can’t remember who it was that described the blog world as “like usenet, without the boring people”.

I find that the best way to find new interesting blogs is to assume that “X thinks Y’s blog is good” is transitive. So, I end up collecting new RSS feeds because they’ve been linked from one of my current feeds, or because they’re on the blogroll of a feed. I don’t always stick with them in the long term though. So, it’d be nice if an RSS reader could package up this behaviour somehow. Maybe you’d have a “Suggest five new feeds for me” feature? But an automated system would require metadata on each blog page to direct you to the RSS feed for that blog. This would let a program chase links between blogs and get to the RSS easily. I don’t know if that meta-information exists today.