Parsing and stuff

I’ve been busy doing lots of different stuff recently (going to my first ever wedding, recording my first ever CD). But in the computer world, I got a few emails last week from people asking about C++ parsing, so I updated my Parsing C++ page to bring it up to date. Whilst updating it this morning, I realised that I make a strong distinction between “real problems” which are interesting and worthwhile to solve, as opposed to other kinds of problems which are just annoying and timeconsuming. C++ parsing falls in the latter category, which is surprising since I’d personally use a C++ parsing toolkit quite a lot. But it should be a non-problem. Languages should be designed so that they’re easy for humans and computers to parse. End of story. A language which fails this test is plain bad. Writing glue code between API’s is another annoying problem. I’d rather spend my time on Real problems. But what do I mean by “real” problems in programming? I’m not entirely sure myself. Problems which are not specific to a particular programming language, or to a particular hardware architecture, I guess. But at the same time, I’m interested in using computers as tools to achieve some purpose – learn about electromagnetism for example. I think I’ve just got to a point in my life where I’m looking back over what I’ve done in the past 10 or so year, and planning how to spend the next 10 years, and wanting to make sure I actively choose what to do rather than passively ending up doing something.

I enjoyed Kim’s summary of the Composable Memory Transactions paper. I like these kind of mini-summaries of papers, since my list of “papers to read” grows faster than I can keep up with. To most people, “concurrency” means threads and mutexes. This is a horribly primitive way of approaching concurrency – notoriously difficult for us puny humans to get right, and little better than a house of cards. Higher level models such as CSP are much better for building safe, stable programs. I don’t think I’d really explictly understood the importance of this kind of composability before, but clearly it’s a vital ingredient in building larger and larger systems.

I’ve also discovered the dirtsimple blog, which has provided lots of interesting reading.

Cuckoo pointers

A cuckoo pointer is a particularly nasty kind of bug to track down in a C/C++ program. I’ve only encountered two of them in the eleven or so years I’ve been using these languages, but when they occur it can be very difficult to understand what is going on. A cuckoo pointer starts life as an old fashioned dangling pointer – a reference to a object on the heap which has been deallocated. It becomes a cuckoo pointer if you are unlucky enough to allocate an object of the same type, at the same address as the old one. Got that? You’ve got some buggy code which hasn’t realised that the old object is now dead, but now suddenly down plops a new object right in its place.

Normally dangling pointers aren’t too tricky to track down. If you write to memory using one, you’ll almost certainly spectacularly trash a bit of memory and your program will die. But in the cuckoo pointer scenario, the write to memory will probably kinda make sense. If the heap object was a string, the cuckoo pointer will be doing string-like operations to the memory, and your program will probably continue running for a while.

You might argue that the chances of a new object of the same type being allocated at exactly the same address are pretty slim. That’d be true, up to a point, but if your heap implementation is using a block-allocator strategy, the chances shoot right up. A block allocator means that allocations of blocks of 8 bytes will come from a special pool of contiguous 8-byte blocks. There’s a seperate pool for each size of blocks – 2, 4, 8, 16, 32 – up to some maximum size. This is a fast way of allocating memory, because the regular structure means that only minimal bookkeeping is required. But under this scheme, the chances of an new object landing at the same address as a recently-freed object of the same type are much increased.

Why are cuckoo pointers so hard to track down? It is because your application will continue executing for a while after the initial error (the dangling pointer). The application will potentially crash a good while after that error, although at least the crash is likely to be in an object of the same type. It’s not too unlike a data race, where two threads are writing to the same bit of memory. Here you have two objects – one real, one not – existing in the same bit of memory.

How do you track down cuckoo pointers? Well, the first step is to realise that they exist! Then you can switch on the heap’s “do not reuse free’d memory blocks” option and see if that changes the behaviour of the program at all.

Words

I like discovering the story behind words. A few months, I realised that “mer” was greek for “parts” and so “polymer” and “monomer” just meant “many parts” and “one part”. Words which come from other languages, and any “native” words, must themselves have ultimately just been made up by someone at some point. The problem is that the birth of most words is not recorded. You might be able to guess why the word was chosen (like, “anteater”) but I think it’s unusual to be able to pinpoint exactly when a word first entered the language.

I’ve been reading a recent biography of Michael Faraday, who did a lot of important early work in electromagentism, and from this book I read the following historic nugget. By 1831, a lot had been discovered about the nature of electromagnetism, but the language used to describe the phenomena hadn’t caught up. People were often using analogies to water – they talked of “electrical fluid – but this analogy could be confusing. Faraday started a correspondance with Revd William Whewell at Cambridge Uni, which I’ll paraphrase:

FARADAY: “I think we need some names for the terminals of a battery. I’ve came up with: exode, zetode, zetexode but I also think westode and eastode are pretty catchy too”

WHEWELL: “Hmm, that’s a bit of a mouthful. How about anode and cathode? Pretty solid greek background for those words”

FARADAY: “Cheers Will, but the guys down at the pub weren’t very convinced by anode and cathode. They laughed at my poor use of greek”

[ FARADAY goes away and writes a paper using the terms DEXIODE and SKAIODE instead ]

WHEWELL: “Let me give you a quick crash course in greek, and you can tell your mates where to stick their criticisms”

FARADAY: “Y’know Will, I think you might just be right after all”

And that’s how the words “anode” and “cathode” first entered the language of electricity. That’s why we have “cathode ray tubes”. Not content with that, Faraday and Whewell went on to add the words “ion”, “dielectric” and also “diamagnetic” and “paramagetic” to the language, all terms which are still used today when describing electricity and magnetism.

The Future

I always liked Technetcast and now I’m listenting to it’s spiritual descendent, IT Conversations. It’s a collection of presentations and lectures from various conferences around the world, all in mp3 format. So I copy them onto my mp3 player, and I get to listen to Steve Wozniak telling his life story while I’m at the gym, or Stephen Wolfram talking about physics as I cycle somewhere.

This is the kind of thing that The Future always promised when I was younger. I can have video chats with my family over the internet. I can listen to audio lectures anywhere I want on a tiny portable device. I can check my email using GPRS halfway down a mountain biking trail. I can watch a degree level course on electromagnetism from MIT whenever I want, from the comfort of my own flat.

Okay, so we don’t have teleporters yet. And hard AI didn’t do so well. But, all things considered, the Future is doing pretty well.

Extreme Extreme

A while ago, I had the idea of always giving your customers a specially instrumented version of your product. The application would know what lines of code had been executed during your testing phase, and whenever the user ran a bit of code which hadn’t been tested pre-release, a dialog box would popup saying “Hey, you’re running code which we didn’t bother to test!”. How would you feel about doing that for your application?

Anyway, I stumbled across Guantanamo today which takes that one step further. It runs your unit tests, find any lines of code which haven’t been executed during the tests *and deletes them* from your source code. If they’re not being test, they’re not going allowed to exist. Heh, how cool is that?