One Renegade Cell

I’ve just finished One Renegade Cell. It is probably the best pop-sci book I have read. The book provides an overview of how our understanding of cancer has developed over the last few decades. It reminded me of the “Double Helix” book by Crick and Watson, in that “science” is portrayed in a very honest manner – filled with dead-ends, misunderstandings, chance discoveries, persistence, hard work and dumb luck. I found it quite easy to read, since each new concept is introduced only when needed, and explained in context.

Also, the book contains some classic lines, such as “the colon provides an embarassment of riches” and “our understanding of metastasis is still fragmentary”.

As a computer scientist, I couldn’t help thinking of the problem in terms of reverse-engineering binary programs. I’m evidentally not the first to think in these terms. Reverse-engineering a well-written program is difficult enough, but understanding cancer is more akin to reverse-engineering some multi-threaded spaghetti code.

The book drills down from a high-level epidemological view of cancer, right down to the level of bases and proteins. The book finishes off down at this low level, and this left me feeling that I had seen the static structure (DNA and enzymes) and dynamic structure (proteins synthesis and chemical message pathways) of human cells. Well, not just some anonymous ideal “human”, but my cells too. But now I’m intrigued as to what makes “me” different from a bundle of cellular clockwork, dumbly following the laws of chemistry/physics. There’s no need for consciousness in the clockwork world of cell biology.

I often look at flies flying around in their weird “go straight then change direction after a random time” manner and wonder if they are just essentially chemical finite-state machines. And perhaps, someone might look at me and imagine that I too am just a chemical finite-state machine! And I might not have an easy job convincing them that I’m not. But even thought there’s plenty of basic chemistry going on inside me, that’s not the whole story because there is a “me” here looking out.

Ah, so I think I’ve talked myself into buying a second book from the “Science Masters” series – How Brains Think!

In other news, Susan and I got married earlier this month. Woo! 🙂

Automate the automation

I am pragmatically lazy. As such, I love setting up command aliases to suit whatever task I’m currently working on. Prime candidates are “cd /long/path/to/a/commonly/used/directory”. These usually get single letter aliases, so I can flip between places in double-quick time. Emacs (or gnuclient) is never more than a “e” away. The old favourite “cd ..” gets shortened to “..”, and to complete the family I also have “…” and “….” to go up faster. I found myself using “find . -name” lots, so that is now “f.”. I can connect to commonly used remote hosts quickly courtesy of some one-letter aliases and ssh-agent.

All of this is fine and good, but I find that there’s a certain inertia that I have to overcome before I finally relent and add a new alias to my .bashrc file. I type a long command and think, “hmm, I should really add an alias for that”. But I often don’t get round to it.

The solution is, of course, to automate it (meta-automation!). At the point when I think “I should add an alias”, I can now just run my newly created “aliasadd” function which grabs the command I just typed and asks me for a name. It then adds it to my shell startup files and also to the currently running shell. I can now add a new alias in the time it used to take me to think “Gee, maybe I should add an alias for that last command”.

(NB: my .bashrc invokes a seperate .bash-aliases file (ie. by doing “. .bash-aliases”. This helps to organize my startup scripts. Therefore, the code below appends to a .bash-aliases file, whereas you might prefer it to target .bashrc)

Scottish Programming Languages Seminar

I went to the Scottish Programming Languages Seminar the other week. I wasn’t very sure beforehand if a) I’d be welcome, as a non-academic or b) if it’d all be way above my head. An email to one of the organizer revealed that I would be more than welcome to attend, and looking at the programme revealed that I could at least understand the title of all the talks.

It turned out to be an enjoyable day. These get-together seminars are evidentally the oil which keeps the academic world moving. They are sociable occasions for those who know other attendees, and lots of information and tidbits flow around between participants. I made a pretty good go of striking up conversations with strangers – made easier by the fact that a) they were all academics, and b) they were attending a programming languages seminar, which makes for easy conversation openers. Some people were surprised (but happy) that someone who works in industry would want to come along, and it was fun to listen to the talks and discussions from my (possibly) more pragmatic and less theory-driven point of view.

Richard Connor proposed trying to design a somewhat rigourous experiment to see whether there is a productivity difference between static and dynamic languages. While I approve of the broad idea, I think that in making the experiment practical (constraining everything except what you are measuring) you’d throw the baby out with the bathwater. When you use a dynamic language, it’s not because you have a masochistic enjoyment of finding statically-findable bugs by hand. It’s because you enjoy a much more flexible overall programming experience – different toolset, and better support for “exploratory programming” as you learn about the problem domain. So, by merely contrasting language variants without following through on the implications of those language choices (ie. if you had everyone using notepad for source editing, as opposed to eclipse for the static-typed case and squeak for the dynamic-typed case) you end up measuring something which isn’t very interesting. But, having said that, Richard made the point several times that if you don’t simplify things down you can’t measure anything useful. Then again, perhaps you need to go meta and ask exactly what is being measured, and whether it is possible or worthwhile to measure it.

Anne Benoit’s talk was about improving the performance of parallel programs running on a distributed hetreogenous network. It would appear that people write these programs in two levels. The lower level are (conceptually sequential) blackboxes of functionality, such as a blur filter in a video-processing application. The higher level is a declarative description of how these boxes should be combined – eg. in a pipeline, or as fan-out or fan-in patterns (almost like Design Patterns?). To deploy your applications, you map your high-level design onto the physical machines which are available to you. You could have all your blackboxes on one physical machine, communicating via shared memory. Or you could have each blackbox on a different physical machine, communicating over the network. A good mapping can be calculated programatically, but it’s not necessarily a one-shot operation. If your application is running on a shared cluster, machine load and node-to-node communication latency might vary throughout the run. At some point, it may be beneficial to reconfigure your application. For example, if latency is rising, it may be worthwhile moving your application to fewer CPUs, even though that would decrease the number of cycles available to you. So, by using frameworks like the Network Weather System you can monitor your application and make these kinds of decisions.

Conor McBride (Mr Dependent Types) talked about abstracting out a pattern which he’d found himself writing several times in source code. Monads are a good starting point here. Monads are the distillation of a commonly occuring pattern whereby the results of computations are combined by some fixed policy. If you realise that some part of your program fits this mould, then you start getting lots of nice stuff for free. Think of them as design patterns with formal algebraic properties. If a mathematician wakes up one day and realises “hey, the Integers (under addition) are an abelian group” then he can get out his list of “properties which all abelian groups have” and get all that for free. Anyhow, Conor’s pattern (an Idiom) isn’t quite a monad – it’s slightly weaker. Which, on one hand, means that there is less pretty structure to get for free. But on the other hand, it means that the entry requirements are easier to meet and so at least you get some pretty structure for free.

The formal side of this talk was a little bit beyond me. I have to concentrate when people discuss Monads formally. So when the talk moved on to contrast Monads, Idioms and Arrows (an even more general form) it was a level of abstraction beyond what I can cope with just now. Still, I wasn’t completely lost and it’s motivated me to learn instead of scaring me off.

I’ll skip the feature-oriented programming talk. It was interesting enough, but it wasn’t an area particularly close to my heart.

The final talk by Sven-Bodo Sholz was very good. The purpose of static type systems is to identify “good” and “bad” programs, for some definition of good/bad. Good programs can be typed, whereas bad programs fail to typecheck. That’s the ideal anyway. In practise, there are always a few good programs which are fail to type. There are also a few bad programs which pass the type checker (divide by zero, out-of-bounds array access) and require runtime checks. Innovations in type systems minimize that number of miscategorized programs, but types systems rapidly become undecidable when you go too far. Sven-Bodo works on an array-based language and for him the problem of catching out-of-bounds array accessing is much more pressing than for your average programmer – he deals with arrays all the time, of varying shape and length. The proposed solution is an interesting hybrid of static and dynamic typechecks. If the compiler can infer enough information to construct a type like “int array with 3 rows and 4 columns” it will do so. If it can only manage “int array with unknown shape” it’ll make do with that. At the end of the day, the compiler will statically check everything it can, and leave the remaining checks until runtime. So you have this interesting situation whereby a compiler doesn’t just say “yes, it typechecked” or “no, it didn’t typecheck”. Instead, the compiler acts more like a student who’s been told to do some proofs as homework. It says “well, I managed to prove all these properties in the first half perfectly, but I could only prove some vague and less concrete properties in the second half”. And then you can either go back and tweak your source code to provide more information, or you can go ahead and run your program, knowing that some properties have been checked statically (ie. they are true for all possible runs of the program) and some will be checked dynamically (ie. the program will terminate if a properties is discovered to be false). What’s interesting is the flexibility for static types to be more concrete or more vague, and the relationship between these (“int array of unknown shape” is a supertype of “int array with three rows and two columns”), and how to deal with partial information.

Anyhow, I’m glad I went along. I’m interested in this stuff, but I have very few people to discuss it with (since they mostly live in academia). It was nice to know that I wasn’t totally out of my depth too.

That can’t be happening

I’m writing a neat program which uses the web services API. At first, I tried the python bindings but then I got frustrated with them. Python, whilst a pretty decent dynamically-typed language, feels like smalltalk without any of the good tools. So, since I’m writing this program for my own use, I ditched python and started using Squeak instead. Woo, squeak is so much more fun!

However, I got stuck the other day when I tried using some of the flickr methods, like setTags(), which require you to use HTTP POST and provide username/password to prove that you are allowed to change tags. I would send off the request, and flickr would send me back an XML response saying “user not logged in”. I checked and double checked the request I was sending, but it still didn’t work. I found tcpmon which is a lightweight tool which makes it easy to trace the requests. Everything looked okay, but I still got an error back. In the end, I felt like I was banging my head against a brick wall. I couldn’t debug beyond the API boundary to find out what thought was happening. I decided to give up for the day, and saved the squeak image to preserve the state.

Today, I fired up the image, reran the same code I tried yesterday. It worked first time. The setTags call succeeded.

Hang on. This is exactly the same world-state which I left yesterday, courtesy of smalltalk’s use of images. Nothing has changed. But suddenly it works.

I’ve no idea why it didn’t work before, or why it suddenly started working. Maybe changed something on their site? Regardless, it’s been an interesting lesson on how to debug problems when calling webservices, and an example of the potential for frustration when an API doesn’t work like you’re expecting, and there’s not enough information to understand why.

Employ wiki grass extensions

After the longest interview process in the world, my new job looks something like this. I am very excited. 🙂

I’d managed to glance at the TiddlyWiki website a few months ago without managing to notice that it is a complete wiki within a single HTML page. A single-user wiki, admittedly. But an impressive HTML + CSS + Javascript tour de force. Download a copy to your local disk, and after saying “Yes, this javascript is allowed to save to disk”, you can edit away. I’m impressed and horrified in equal measure.

I have been doing some more map-work in the background, using the baroque, heavyweight but very featureful GRASS package. I should have something tangible up on the web soon. Furthermore, I am writing GRASS tutorial so that other open-map folks can learn grass without enduring quite as much pain.

Firefox extensions rock, and after recently upgrading I’ve settled on the following must-have extensions: Web developers toolbar, some Live HTTP headers and an Aardvark for general web development, plus user agent switcher, TargetAlert and a better Download manager for general day to day stuff.