Category: General

A link to LINQ found via links

Post author By Andrew
Post date 12/11/2005

Microsoft TV have an interesting program about their LINQ project. It’s somewhat similar to parts of Phil Wadler’s LINKS project (from whom I found the link in the first place). The aim is to shrink the gap between general purpose programming languages and data query languages. It’s well worth a watch. They’re got these crazy things new things called lambdas, which they use to perform map and filter operations over collections (gasp). But the nice thing is that the query code you’d use to find an element in a linked list can also be used to query rows in a SQL database. There’s some neat stuff happening behind the scenes. They also make use of a new (well, new for C# anyway) kind of variable declaration: var x = "foo". This declares x to be a statically-typed variable, whose type is established by the compiler using type inference. Sound familiar yet? Finally, they’ve also introduce a ruby-style ability to open up an existing class definition and add new methods to the class. This allows them to add new methods like ‘where’ and ‘orderby’ to existing collection classes, without requring the original source.

Staying on the web theme, I got a lot our of reading Brad Fitzpatrick’s description of how the LiveJournal site is laid out behind the scenes.

General

Amazon / UndoDB / concurrency

Post author By Andrew
Post date 11/13/2005

Amazon.com’s Scottish developer center is expanding and hiring lots more developers and team leads. It’s located next to the Forth Road Bridge, near Edinburgh, and close to all the major traffic links. It is a very cool place to work, and the office has a great atmosphere. You’d be working on the software which underpins one of the world’s largest e-commerce sites. The massive scale of things makes the job exciting – massive traffic levels, extremely high reliability requirements, and demanding real-time requirements. If it all gets too much, chill out in the games room and shoot a few games of pool. If you think you are up to the challenges, please look through the recruitment site and mention “Andrew Birkett’s website” as your referral. 🙂

Speaking of companies who are doing cool things, check out Undo Software. They have built a time-travelling extension for the gcc debugger under linux. Pretty cool stuff, similar to the ocaml time-travelling debugger. If you are a C/C++ programmer who is a bit jealous of all the cool toys you get for other languages check it out.

General

One Renegade Cell

I’ve just finished One Renegade Cell. It is probably the best pop-sci book I have read. The book provides an overview of how our understanding of cancer has developed over the last few decades. It reminded me of the “Double Helix” book by Crick and Watson, in that “science” is portrayed in a very honest manner – filled with dead-ends, misunderstandings, chance discoveries, persistence, hard work and dumb luck. I found it quite easy to read, since each new concept is introduced only when needed, and explained in context.

Also, the book contains some classic lines, such as “the colon provides an embarassment of riches” and “our understanding of metastasis is still fragmentary”.

As a computer scientist, I couldn’t help thinking of the problem in terms of reverse-engineering binary programs. I’m evidentally not the first to think in these terms. Reverse-engineering a well-written program is difficult enough, but understanding cancer is more akin to reverse-engineering some multi-threaded spaghetti code.

The book drills down from a high-level epidemological view of cancer, right down to the level of bases and proteins. The book finishes off down at this low level, and this left me feeling that I had seen the static structure (DNA and enzymes) and dynamic structure (proteins synthesis and chemical message pathways) of human cells. Well, not just some anonymous ideal “human”, but my cells too. But now I’m intrigued as to what makes “me” different from a bundle of cellular clockwork, dumbly following the laws of chemistry/physics. There’s no need for consciousness in the clockwork world of cell biology.

I often look at flies flying around in their weird “go straight then change direction after a random time” manner and wonder if they are just essentially chemical finite-state machines. And perhaps, someone might look at me and imagine that I too am just a chemical finite-state machine! And I might not have an easy job convincing them that I’m not. But even thought there’s plenty of basic chemistry going on inside me, that’s not the whole story because there is a “me” here looking out.

Ah, so I think I’ve talked myself into buying a second book from the “Science Masters” series – How Brains Think!

In other news, Susan and I got married earlier this month. Woo! 🙂

General

Automate the automation

I am pragmatically lazy. As such, I love setting up command aliases to suit whatever task I’m currently working on. Prime candidates are “cd /long/path/to/a/commonly/used/directory”. These usually get single letter aliases, so I can flip between places in double-quick time. Emacs (or gnuclient) is never more than a “e” away. The old favourite “cd ..” gets shortened to “..”, and to complete the family I also have “…” and “….” to go up faster. I found myself using “find . -name” lots, so that is now “f.”. I can connect to commonly used remote hosts quickly courtesy of some one-letter aliases and ssh-agent.

All of this is fine and good, but I find that there’s a certain inertia that I have to overcome before I finally relent and add a new alias to my .bashrc file. I type a long command and think, “hmm, I should really add an alias for that”. But I often don’t get round to it.

The solution is, of course, to automate it (meta-automation!). At the point when I think “I should add an alias”, I can now just run my newly created “aliasadd” function which grabs the command I just typed and asks me for a name. It then adds it to my shell startup files and also to the currently running shell. I can now add a new alias in the time it used to take me to think “Gee, maybe I should add an alias for that last command”.

(NB: my .bashrc invokes a seperate .bash-aliases file (ie. by doing “. .bash-aliases”. This helps to organize my startup scripts. Therefore, the code below appends to a .bash-aliases file, whereas you might prefer it to target .bashrc)

General

Scottish Programming Languages Seminar

Post author By Andrew
Post date 7/3/2005
1 Comment on Scottish Programming Languages Seminar

I went to the Scottish Programming Languages Seminar the other week. I wasn’t very sure beforehand if a) I’d be welcome, as a non-academic or b) if it’d all be way above my head. An email to one of the organizer revealed that I would be more than welcome to attend, and looking at the programme revealed that I could at least understand the title of all the talks.

It turned out to be an enjoyable day. These get-together seminars are evidentally the oil which keeps the academic world moving. They are sociable occasions for those who know other attendees, and lots of information and tidbits flow around between participants. I made a pretty good go of striking up conversations with strangers – made easier by the fact that a) they were all academics, and b) they were attending a programming languages seminar, which makes for easy conversation openers. Some people were surprised (but happy) that someone who works in industry would want to come along, and it was fun to listen to the talks and discussions from my (possibly) more pragmatic and less theory-driven point of view.

Richard Connor proposed trying to design a somewhat rigourous experiment to see whether there is a productivity difference between static and dynamic languages. While I approve of the broad idea, I think that in making the experiment practical (constraining everything except what you are measuring) you’d throw the baby out with the bathwater. When you use a dynamic language, it’s not because you have a masochistic enjoyment of finding statically-findable bugs by hand. It’s because you enjoy a much more flexible overall programming experience – different toolset, and better support for “exploratory programming” as you learn about the problem domain. So, by merely contrasting language variants without following through on the implications of those language choices (ie. if you had everyone using notepad for source editing, as opposed to eclipse for the static-typed case and squeak for the dynamic-typed case) you end up measuring something which isn’t very interesting. But, having said that, Richard made the point several times that if you don’t simplify things down you can’t measure anything useful. Then again, perhaps you need to go meta and ask exactly what is being measured, and whether it is possible or worthwhile to measure it.

Anne Benoit’s talk was about improving the performance of parallel programs running on a distributed hetreogenous network. It would appear that people write these programs in two levels. The lower level are (conceptually sequential) blackboxes of functionality, such as a blur filter in a video-processing application. The higher level is a declarative description of how these boxes should be combined – eg. in a pipeline, or as fan-out or fan-in patterns (almost like Design Patterns?). To deploy your applications, you map your high-level design onto the physical machines which are available to you. You could have all your blackboxes on one physical machine, communicating via shared memory. Or you could have each blackbox on a different physical machine, communicating over the network. A good mapping can be calculated programatically, but it’s not necessarily a one-shot operation. If your application is running on a shared cluster, machine load and node-to-node communication latency might vary throughout the run. At some point, it may be beneficial to reconfigure your application. For example, if latency is rising, it may be worthwhile moving your application to fewer CPUs, even though that would decrease the number of cycles available to you. So, by using frameworks like the Network Weather System you can monitor your application and make these kinds of decisions.

Conor McBride (Mr Dependent Types) talked about abstracting out a pattern which he’d found himself writing several times in source code. Monads are a good starting point here. Monads are the distillation of a commonly occuring pattern whereby the results of computations are combined by some fixed policy. If you realise that some part of your program fits this mould, then you start getting lots of nice stuff for free. Think of them as design patterns with formal algebraic properties. If a mathematician wakes up one day and realises “hey, the Integers (under addition) are an abelian group” then he can get out his list of “properties which all abelian groups have” and get all that for free. Anyhow, Conor’s pattern (an Idiom) isn’t quite a monad – it’s slightly weaker. Which, on one hand, means that there is less pretty structure to get for free. But on the other hand, it means that the entry requirements are easier to meet and so at least you get some pretty structure for free.

The formal side of this talk was a little bit beyond me. I have to concentrate when people discuss Monads formally. So when the talk moved on to contrast Monads, Idioms and Arrows (an even more general form) it was a level of abstraction beyond what I can cope with just now. Still, I wasn’t completely lost and it’s motivated me to learn instead of scaring me off.

I’ll skip the feature-oriented programming talk. It was interesting enough, but it wasn’t an area particularly close to my heart.

The final talk by Sven-Bodo Sholz was very good. The purpose of static type systems is to identify “good” and “bad” programs, for some definition of good/bad. Good programs can be typed, whereas bad programs fail to typecheck. That’s the ideal anyway. In practise, there are always a few good programs which are fail to type. There are also a few bad programs which pass the type checker (divide by zero, out-of-bounds array access) and require runtime checks. Innovations in type systems minimize that number of miscategorized programs, but types systems rapidly become undecidable when you go too far. Sven-Bodo works on an array-based language and for him the problem of catching out-of-bounds array accessing is much more pressing than for your average programmer – he deals with arrays all the time, of varying shape and length. The proposed solution is an interesting hybrid of static and dynamic typechecks. If the compiler can infer enough information to construct a type like “int array with 3 rows and 4 columns” it will do so. If it can only manage “int array with unknown shape” it’ll make do with that. At the end of the day, the compiler will statically check everything it can, and leave the remaining checks until runtime. So you have this interesting situation whereby a compiler doesn’t just say “yes, it typechecked” or “no, it didn’t typecheck”. Instead, the compiler acts more like a student who’s been told to do some proofs as homework. It says “well, I managed to prove all these properties in the first half perfectly, but I could only prove some vague and less concrete properties in the second half”. And then you can either go back and tweak your source code to provide more information, or you can go ahead and run your program, knowing that some properties have been checked statically (ie. they are true for all possible runs of the program) and some will be checked dynamically (ie. the program will terminate if a properties is discovered to be false). What’s interesting is the flexibility for static types to be more concrete or more vague, and the relationship between these (“int array of unknown shape” is a supertype of “int array with three rows and two columns”), and how to deal with partial information.

Anyhow, I’m glad I went along. I’m interested in this stuff, but I have very few people to discuss it with (since they mostly live in academia). It was nice to know that I wasn’t totally out of my depth too.