Archive for the ‘Programming’ Category

Polymorphic function; a tale of two forM_’s

Tuesday, February 5th, 2008

Tonight I spent over an hour learning that there are two functions called “forM_” in haskell.

The first, which I was already familiar with, lives in Control.Monad. Its purpose is to take a list of values, apply some function to them all to produce a list of actions, and then run these actions one after the other.

The other one, which I did not know about, lives in Data.Foldable. It does exactly the same job, but it works on any “Foldable” data type; ie. those which can be collapsed/folded in some way to a single summary value. Lists are foldable. Maps are also foldable (across their elements, not their keys).

And this was the cause of my perplexification this evening; I had been storing my data in a list, and using forM_ (the monadic one, the only which I knew about!) to process it. At some point, I decided to change to storing my data in a map instead, and ran the compiler to see the errors which would point me to the bits of code I needed to fix up.

Except the compiler happily compiled everything without errors. *confusion*

Much headscratching followed, until eventually I added “-ddump-tc” to get ghc to dump out the results of its typechecking pass. This lead me to the root cause: I had been picking up forM_ from Data.Foldable all along, not Control.Monad. Duh! Everything had been working fine up until then, since Foldable.forM_ and Monad.forM_ operate identically on lists. But the former also works on Data.Maps whereas the latter does not.

I’m sure there’s a moral to this story. I’m just not sure what it is yet.

MethodFinder for Ruby

Tuesday, May 16th, 2006

I was preparing a talk about Smalltalk for the boys and girls at Amazon when it occured to me that it’s easy to implement Smalltalk’s MethodFinder in Ruby. So, without further ado, here’s a Ruby MethodFinder. Take the red pill, dynamic Neo!

Excerpt from the article:

I use lots of different programming languages, and they all seem to have different names for the same concepts. For example, string concatenation is “+” in ruby and java, “.” in perl, “,” in smalltalk and “^” in ocaml. After a while, this starts to drive you mad. I know that I want a method which takes “foo” and “bar” and returns “foobar” but I can’t remember which incantation I need to utter today.

Smalltalk has this neat thing called the MethodFinder. It lets you find methods by providing an example. If you’ve got the string “foo”, you can ask the MethodFinder to find all the method that, when called with argument “bar” return “foobar”. This is a very useful tool. No more scrabbling around with documentation to find the name of a method which you know exists. Stay in the red-pill world, and ask the code.

Now, ruby is basically smalltalk (without lots of the k3wl bits). So we can easily build a method finder in ruby too!

Read the full article here.

Callstacks of the future

Wednesday, April 12th, 2006

I read a blog entry recently (but can’t find the reference now, sorry) which changed my view about callstacks. Now, these are pretty fundamental things and I’ve been aware of them for nearly twenty years. So this was a pretty remarkable change.

In my head, a callstack was a record of history. When I drop into the debugger, I look at the callstack to see “where I’ve been”.

That’s wrong. The callstack isn’t a record of history. It’s a record of the future. It tells you where your computation is going to go next. The “return addresses” on the stack are where computation will continue from in the future.

Now, it’s easy to see why my misconception has continued for so long. There’s usually not much difference between the two views. Most of the time in a C-like language, you get called from some point in a function and then (later) you resume again from just after that point. So is it really worth worrying about such a tiny different?

Yes it is, because it gets your mind ready to understand concepts such as tail-call optimization and continuations much more easily. In the “future” view of callstacks, tailcall optimization becomes obvious. If we don’t need an activation record in the future, we don’t need it on the stack. Similarly, continuations make much more sense.

It’s a pretty small change in metaphor. I haven’t magically learned anything new because of it. But several bits of my knowledge now fit together in a much more satisfying way because of this switch. Whoever wrote that original blog article, thank you! :-)

UPDATE: Rar, finally found the original blog article. Dave Herman was the man with the wisdom. :-)

High Integrity Compilation

Wednesday, May 4th, 2005

The book High Integrity Compilation is keeping my brain engaged as I munch my lunchtime sandwiches. The basic plot concerns writing a bug-free compiler, and the cast consists of the semantics of the high-level language being compiled and the semantics of the target assembly language. Happily, it’s really quite easy going. My previous encounters with formal semantics have generally left me feeling out of my depth. In contrast, this book deals with source/target languages which are just complex enough to show interesting properties but simple enough to pick up immediately.

I’ll try to give an example of why this book makes things easier. The “meaning” of a computer language is usually formally defined by a mapping from “programs” to some kind of mathematical object, like a set. This is useful because we can then leverage all of our mathematics knowledge to learn stuff about behaviour of the program. Now, most books on semantics start with a paragraph like “we’d like to use boring old primary-school sets as our mathematical objects, but they’re not powerful enough”. And before you know it, they’ve starting using domain theory and fixed-points as if they were as common as making a cup of tea. Around this point, I usually head over to slashdot and never return. However, the “High Integrity Compilation” book happily doesn’t disappear down this road. It stays safely up in the land of simple maths and easy state machines. The languages under consideration are simple imperative ones, and so there’s not the same focus on recursive functions that you get with ML-like languages and so you’re not immediately hit by a wall of complexity on page four of the book.

This book also dutifully walks through useful examples, working up from simple three-liners into multi-page walkthroughs. There’s a refreshing lack of the words “obviously”, “clearly” and “trivially” and instead plenty of concrete worked examples.

Of course, this means that the resulting compiler isn’t going to really change the world. The source and target languages are chosen for pedagogical reasons rather than real-world usefulness. But this means that, for someone like me learning on their own, you can quickly get a good grounding in semantics without having to deal with a whole load of picky little uninteresting details.

Crashes of one sort or another

Thursday, December 16th, 2004

In a world where technical subjects are often made needlessly complex through the use of jargon, it’s always refreshing to come across examples of clear thinking and good communication. This time, it’s the serious subject of the report into the causes of the Concorde crash. After the inital horror of the crash itself, there was clearly a need to understand and respond to the causes of the crash (those who ignore history are doomed to repeat it). Apart from being a remarkable insight into the workings of Concorde, what is clear from reading through the report is that it is based around evidence rather than conjecture. Whereas a media report can simply state “it is thought likely that a piece of metal caused a tyre explosion” and be sufficient for its purpose of informing the masses, a formal crash report must stand up to higher levels of rigour. If there was a piece of metal on the runway, there must be a plane out there missing a piece of metal. You need to find that plane and carefully examine it, If you think a piece of metal could cause Concorde’s tyre to explode, you need to get an identical tyre and try running it over an identical bit of metal under realistic conditions to see what can happen. Simply saying “this could have happened” is not enough - you need to demonstrate that the theory is consistent with known events, and you need to carefully put together a jigsaw puzzle, justifying each and every piece.

Probably the most important ingredient for a retrospective investigation like this is the evidence trail, leading backwards into time like the roots of a tree. The design and manufacture of Concorde is well documented, as are the maintenance schedules and details. The operation of the airport on that day is well documented, and even the chance video footage of the accident itself provides information which can be matched up with telemetry and blackbox recordings.

Reading the report reinforced the feeling I had when I read the Ariane 5 report and the Challenger report: We can learn from our mistakes only if we preemptively leave enough crumbs of information along the way.

We’re human, and we screw up pretty regularly. If we accept that we’re going to make mistakes, we can at least plan for this eventuality. By making good choices today, we can minimize the impact and cost of our future mistakes and ensure that we can learn from them.

I don’t design aeroplanes for a living, so it’s time to connect this, as ever, back to the world of software. All our applications contain bugs of one sort or another. Sometimes they crash, sometimes they give wrong answers, sometimes they just lock up and stop responding. They always have done and they always will do. If we accept that this is true, what can we usefully do in response?

Let’s look at crashes first. Crashes are bad primarily because they usually imply some degree of data-loss or state-loss. Perhaps you loose a few pages of the book you were writing. Or perhaps you’d spent ages getting your “preferences” just right, and the app crashed without saving them. Or maybe the application crashes every time you try to print, and you really need to print!

Unlike with aeroplanes, it’s easy to restart the application after a crash. But it’s the data loss which is painful to users. So the First Law of Applications ought to be “never loose a users data, or through inaction allow a user to loose data”. It’s not acceptable to rely on the user to manually save their work every few minutes. It’s not acceptable to trust that the user will quickly do a “Save as..” after starting work on a new document instead of working away for hours in the default Untitled document. It should be the application’s responsibility to ensure that the user’s work is safe in the event of a crash. So, it should be regularly saving the users work, settings and undo/redo queue (and any other state) into a secret file somewhere. In the event of a crash, the application should take care of restarting itself and pick up where it left off.

Continuing with the aviation analogy, a plane doesn’t just have two possible states - “everything fine” and “crashed”. Planes are complex beasts, with engines, hydraulic systems, electrical systems. It is almost certain that something will start to go wrong with some part of the plane eventually. So, the designers also build in fault-detection systems and maintenance schedules. There are systems for detecting unusually high temperatures, excessively low tyre pressures, problems with the electrical system and so on. Furthermore, there are system for checking that those systems are working correctly (if your smoke alarm isn’t working, you don’t have a smoke alarm).

Most applications also contain fault-detection systems. An ASSERT() statement is a little self-check inside a program. If states some property which should always be true (eg. a person’s weight must be a positive value). If the assertion fails, something is badly wrong. A good application will have assertions sprinkled liberally all over the codebase - a little army of sanity-checks, watching out for something going wrong.

However, for some crazy reason, most people only enable these self-checks during inhouse development. When they ship the application, they disable these checks. This might be reasonable behaviour if the software industry had a track record of detecting every single bug before shipping, but that is obvious not the case. Even the best tested application in the whole world will throw up bugs once it’s out in the field, perhaps running on an old Win98 machine, set to Icelandic language, with a slightly unusual mix of recent and out-of-date system DLLs, and a copy of some overactive anti-virus software.

If there is a bug, and you have assertions enabled, you are quite likely to detect the problem early and be able to gather useful information about the application’s current state. With assertions disabled, the same bug will probably allow the application to stumble on for a few more seconds, gradually getting more and more out of control until it crashes in a tangled heap, far away from the original problem area.

Aeroplane designers don’t take out their fault-detection systems once the plane enters service. Neither should application developers. The performance hit is negligable in the face of the increased long-term reliability gains.

Most warning systems in an aeroplane exist because some remedial action can be taken. There’s no “wing has fallen off” warning, because there really isn’t anything you can do to recover from that. What remedial action could a software system take, instead of just stopping when a fault is detected? Actually, if crashing doesn’t loose any user data, then crashing is not too serious. Restarting the app probably gets us back to a known good state. One idea, called microreboots (part of crash-only software) is to reduce the granularity of the “restart”. For example, if a crash occurrs in the GUI part of the app, it may be possible to restart only that section of the program, leaving the underlying data unaffected. Or if a crash occurs on printing, it would be acceptable to abort the printing operation and restart only that chunk of the app.

This is easier in some language than others. In C++, the unregulated access to memory means that the “printing” part of the application can easily scribble over the memory belonging to another subpart of the application. In strongly typed languages (whether dynamic or statically typed) this is not a problem Furthermore, a NULL pointer access in C++ (which is a very common cause of crash) is not recoverable. It does not get turned into an exception, like in Java, and cannot really be dealt with usefully by the application. Given these constraints, it seems to me that a C++ program can only use microboots by splitting the application across multiple processes and using “processes” as the reboot granularity. Furthermore, the application needs to deal gracefully with the fact that subparts may fail and not carry out a request.

Even if you don’t go all the way towards microreboots, you can still use the ideas to flush out bugs from your application. In the classic model/view design pattern, all of the state resides in the model. So, you should be able to pull down the whole view system and restart it whenever you choose. You could, quite possible, do this every minute on a timer. Any bugs caused by the view irresponsibly caching state may well be flushed out by this technique. If you can’t do this in your application, why not?

Let’s revisit the aviation analogy one more time. Recommendation R7 in the Ariane 5 crash report says “Provide more data to the telemetry upon failure of any component, so that recovering equipment will be less essential”. That’s a posh way of saying it’s easier to analyze the blackbox recordings than it is to figure out what happened by piecing together debris.

When an application crashes, the first things the developers want to know is “what were you doing when it crashed?”. If you can reliably remember what you were doing in the leadup to a crash, you’ve got a rare talent. Most people can give a general description of what they were trying to achieve at the time, but only rarely do people remember the exact sequence of events leading up to the crash.

So why rely on people’s memory? The solution is to have the application itself gather the information itself. When an app dies, it actually has plenty of time to collect together a crashdump (or minidump) which details exactly what the program was doing at the point it crashed - what code was running, what the contents of the memory was. It can also explain to the user what has happened, invite them to email the crash report back to the developers, restart the application and resume from where it left off. If a developer receives a crashdump they can usually see immediately what the application was doing at the time it crashed.

Furthermore, the application can have a “blackbox recorder” which keeps track of recent activity. It can record a high-level summary of activity, like “12:10 Opened file “foo.txt”, 12:11 Edit/Select All menu item chosen” etc. If the application subsequently crashes, it can add this summary of recent activity into the crash report. This more-or-less removes the need for users to explain what they were doing.

I like to think of this as proactive debugging. Normally, debugging occurs after-the-fact - something has gone wrong and you need to figure out why. If you adopt the “proactive debugging” mindset, then think “at some point, I’m going to have to track down a bug in this app, so what kind of scaffholding and information would make that task easier?”. And then you add that in ahead of time. As the project develops, and you learn more about what kinds of bugs are common, you can tune your monitoring systems to pick up these problems as early as possible.

I don’t think there’s much chance that developers will stop writing buggy code any time soon, so we may as well concentrate effort on building a better net to catch the bugs once they’re there.