Author: Andrew

Hacking with HAppS

I’m always on the lookout for more fun ways of programming – I hate doing dull boilerplate work. So I’ve been trying out HAppS, a webserver framework for Haskell. It’s interestingly different from most web frameworks. I want to capture my early observations before I forget them. To give some context, my “toy app” I’m building is an “IMDB for Computer Scientists” type website which gives structured information about people (eg. Guy Steele), what papers they’ve published (eg. Growing a Language), what events they’ve participated in (eg. a talk about “Growing a Language” at OOPSLA, available via Google video).

So the biggest thing about HAppS it that it doesn’t have to use a database to persist state. It takes the Prevayler approach, which means all your data lives in RAM as normal in-memory data structures. When the data is about to be modified, a delta is first persisted to a fast log on disk (giving durability) and then the in-memory version is updated. Periodically, checkpoints are taken.

This is a huge win because it means you avoid the whole tedious ‘impedance mismatch’ involved with persisting to a database – O/R mappings, mismatches between DB types and language types. It means that you can create new datatypes easily, and get on with the real business of adding functionality rather than futzing with databases.

Of course, there are downsides to this approach which I will discuss next. But first, the big upside: I can develop features quicker. Life being what it is, I’m almost certainly making lots of mistakes whilst building my app. Without a DB layer to burn time on, I can find out that I’m (inevitably) building the Wrong Thing sooner and make a course correction.

Downside #1: you normally rely on a database to get fast indexed lookups. With HAppS, you do this by constructing an in-memory hash table to act as an index. Actually, that’d be quite boring to do by hand so HApps uses metaprogramming (template haskell) to build the index data structures for you. They are moderately smart, with lazy update strategies.

Downside #2: scalability and availability. A big benefit of the usual “stateless web frontend, stateful database” split is that you can trivially scale the web fleet (no state) and fairly easily scale the database (replication/partitioning). With the HAppS approach it sounds like you can only have a single webserver/state-store machine – which means no horizontal scalability and bad availability. Actually, that’s not the whole story – the HAppS team are working on a system whereby the state gets shared across multiple machines (permitting scalable reads) and sharded (each machine is the master for some subset of the state).

That should be a killer downside, right? The argument is looks like this: “all successful webapps are popular, and popular apps must scale”. However, I think that to become a successful webapp, you have to firstly be a webapp. My “projects” directory is littered with half-finished projects where I got bored before I finished it. Quite frankly, if using HAppS means that I can avoid a lot of tedious work and actually finish a project then I’ll happily embrace the ghost of future scaling pain. Also, not every successful website gets as much traffic as Flickr/Twitter – my toy app is an “IMDB for Computer Scientists” and there just aren’t that many computer scientists in the world. Not every website needs to be super-scalable from day one. I’m not being naive here – my dayjob is scalability central – but it’s a question of finding the most appropriate hammer for the particular nail you’re trying to hit.

Having said all that, let me qualify it a bit: I would ideally like to create a perfectly scalable webapp from day one if there was no overhead in doing so. Most of the industry is focused on scalability-via-database, and attempt to minimize tedious work in various ways (eg. metaprogramming and reflection in ActiveRecord). However, these invariably ends up being painfully ‘leaky’ abstractions. I’ve seen this many times, and it’s given me the motivation to look for other approaches which can avoid the persistence pain. So perhaps HAppS is the answer, especially if my question is “can I write a reasonable webapp quickly” rather than “can I write a superscalable website slowly?”.

Disadvantage #3: Data migrations. This is really a red herring. I don’t really believe that data migrations are an advantage of having your data in a database. If you are using an O/R mapping, you surely want to migrate old objects to new object, not old relations to new relations. The approach which HAppS takes is to version your datatypes (mostly done via metaprogramming) and you write haskell code to migrate from old versions.

Disadvantage #4: Language dependence. If you use a database, you can access your data from any language. You can produce reports, do adhoc SQL queries and such like. These are indeed useful; I do this a lot. However, language-independence comes at a cost (impedance mismatch between database/language data types, business rules usually not applied at the database layer). Often, language independent access can be better provided by service layers which wrap the database layer, providing insulation from the schema changes and allowing fine grained access control and throttling. I’ve never been able to query Flickr’s database directly, but I’ve accessed their data via their API many times. HAppS provides more metaprogramming support for exposing data types via XML.

So that’s why I’m investing the time into learning about HAppS. It’s good to challenge your assumptions about how to do things. In the next post, I’ll write a bit about how HAppS works under the hood.

Tags happs, haskell

General

Duct Tape In Space

Post author By Andrew
Post date 12/29/2008

ObRandom: A rather detailed view of how duct tape was used during the Apollo moon landings.

General

Structure and Interpretation of Computer Programs

Post author By Andrew
Post date 12/14/2008
2 Comments on Structure and Interpretation of Computer Programs

Over the last year, I’ve got a lot more picky about how I spend my ‘free’ time – I think all new fathers encounter this! I used to read reddit and follow all the latest k3wl happenings in the programming world. Occasionally, I’d read something that was really insightful but mostly it was just here-today-gone-tomorrow fluff .. “my way is more awesomer that yrs” kinda stuff. So I decide to try and ignore the fluff by thinking “will this be interesting in a year from now?” before I start reading. And pretty soon afterwards, I decided to commit time to watching the videos of the SICP lecture series.

SICP (see amazon.com) was the basis of the introductory CS course at MIT for many years. That makes it sounds like it’ll be a noddy course for noobs. It really absolutely is not. It covers a massive range of material – stuff which I’ve been learning gradually since leaving uni – in a really pretty accessible way. I’m interested in programming languages in general, and I now wish I’d watched these years ago. Perhaps it wouldn’t have made so much sense back then, but it would’ve equiped me with a useful (albeit non-mainstream) way of looking at the world.

The course explores “programming languages” as a concept, not “programming as an activity”. It’s a lesson in how to make tools, not just how to use those tools. So, you won’t learn how to build a big object oriented application. But you will learn different ways to implement support for “objects” in a language – eg. different ways to do method dispatch. If you know about C++ or Java, this kind of thing is baked into the language. SICP will show you ways to to mix the cookie dough, rather than just describing the cookies. And it’ll also tell you how to make a mixing bowl out of cookie dough too!

The course uses the scheme language. The choice of language isn’t shockingly important – it’s just a vehicle for making the concepts discussed in the course concrete. The real concepts – abstraction, combination and naming – are the fundamental ideas and transcend languages. Scheme is just a convenient way of talking about these concepts. The scheme language (its pretty simple) is explained as you go along anyway.

The videos were made in 1986, so they provide an interesting snapshot in time. The vast majority of the content is still 100% relevant today. The lectures on logic programming and streams (lazy evaluation) are interesting because in 1986 it wasn’t so clear how these ideas would turn out – eg. Miranda is mentioned, but Haskell was still to come. For example, the lecturer motivates the introduction of mutable state by showing how threading state between pure functions breaks encapsulation – whereas today you could use monads to make this more manageable.

With the benefit of 20 years of hindsight, there’s a few things that come across as being klunky. There’s a lot of interesting discussion about the relationship between car/cdr/cons and how you could realize those concepts. However, when we get on to more implementation heavy stuff, car/cadr/etc are used purely as the way to pluck data out of lists. Coming from an ML/haskell background, I was crying out for pattern matching. Ironically, pattern matching is covered in lecture 4, but it’s not added into the standard vocabulary. Instead the lecturer keeps saying “to find the foo, we need to take the car of the cadr of the list”. It made me miss pattern matching a lot!

The video series is also a reflection of the MIT research interests of the time. If you’ve only ever used one language before, it’s difficult to understand how that language affects the way you express and understand things. In the early days, it wasn’t clear what the ‘best way to program computers’ was. It probably still isn’t clear today. But in the 70’s and 80’s MIT was busy churning out language after language to try to better solve the “difficult AI problems” of the day. People were trying lots of different ideas – procedural style, rule-based systems, logic systems, objects, lazy evaluation etc- and consequently there was a rich ecosystem of programming languages which acted as vehicles to try these ideas out. This is the culture in which scheme was developed, and the choice of topics covered in SICP reflects this period.

So what are the good bits of the lecture series? Higher order function are introduced early as a basic building block and thereafter taken for granted. I think that somewhat sets the tone as far as this ‘introductory’ course goes! The benefits of decoupling interfaces/contracts from concrete implementation is covered repeatedly via examples. And the lectures are peppered with “shocking” examples – eg. implementing car/cdr/cons just using lambda – which serve to illustrate the current topic but also tie together the deeper overall themes of the course.

I enjoyed the wide range of topics covered, and way that main themes continually recur throughout the lectures (abstraction and combination). I feel that this was a well-constructed course, carefully designed to lead students on a journey by showing them many facets of programming, whilst continually explaining how they interrelate. Additionally, I think that it all feels very genuine – it’s a reflection of what Sussman and Abelson were working on at the time. They often stress that the examples they use aren’t just toy examples. The metacircular evaluator isn’t there just for teaching purposes – it’s how these guys experimented with possible language features (say, call-by-name vs call-by-value).

The questions at the end of the lectures are often the best bit – they’re often “uhh, I didn’t understand this bit”, but the answers often shed new light on an area so I’m glad that the questions were asked!

Any downsides? The final lectures on interpretation and compilation gets somewhat bogged down in prolonged hand-cranked examples. In an introductory course, it’s good to illustrate how to make abstract language concepts realizable in real hardware. But even allowing for that, I think they overstretched somewhat.

I was sad when I finished watching the last video. These videos may be twenty years old, but I enjoyed watching them and learned a lot. Mr Sussman and Mr Abelson: thank you for sharing your knowledge and understanding so effectively! 🙂

General

Decentralized monitoring

Post author By Andrew
Post date 12/6/2008

Occasionally, one of the three computers in my flat will crash or hang in some way. When this happens, it’d be nice to be notified. For example, if the MythTV host crashes it’ll stop recording any TV programs until I notice and kick it. So I could just install one of the many network monitoring packages out there. But as far as I can see, they all want me to choose one ‘master’ host and have the other machines report their statistics into the master. That sounds awfully oldskool. Which of my three machines should I pick as ‘master’? They’re all equally likely to go down, and when the master goes down, I won’t get any notifications.

What I really want is a “neighbourhood watch” scheme, where each host watches the others. Or, in posher language, I want a decentralized monitoring solution. Here’s how it’d work. Each host would announce ‘I’m still running’ or ‘My load is 50%’ every few seconds, and the other hosts will listen and record the message to disk. If any one machine bursts into flames, the other two should still have a complete record (including the fact that the failed host stopped talking at 15:30).

Ideally, hosts should be able to add themselves to the party without much hassle. Announcing the messages via IP broadcast would fit the bill, but it’d only work on one LAN segment. Also, the hosts might end up being fairly busy if they have to handle status updates from all other hosts all the time.

Sounds like an ideal application of gossip protocols. When a node first joins the monitoring party, it needs to be told about at least one other participating node. The nodes can subsequently exchange occasional ‘gossip’ messages with a few of their neighbours, telling them about the state of the world. The gossip message can also contain information about other nearby participants. In this way, information gradually spreads out around the network. Network partitions are a minor pain, but so long as there’s at least one other host still around to notice any explosions, they’ll still be noted.

This seems to fit the bill nicely: decentralized, unlikely to lose data, and robust to hosts appearing and disappearing.

So that’s the monitoring and data recording side dealt with (ha, no lines of code written!). But what about the alerting? If one host goes down, I don’t really want N other hosts all to start spamming me with duplicate notifications. I guess neighbourhood watch scheme suffer this problem – everyone in the street hears a window smash and the police get twenty phone calls! But can a decentralized mob temporarily appoint one of their number as a leader in order to send out a single notification? Yes, that’s exactly what distributed ‘election’ algorithms are designed to do. If all the hosts can talk to each other, they’ll manage to agree on a single leader. If there’s been a network partition, at worst we’ll get one leader in each partition and so get a few notifications (ie. the right hand says ‘I can’t see the left hand!’ whilst the left hand says ‘I can’t see the right hand!’).

There’s maybe a bit of tension here. You could write a gossip protocol where a node only keeps track of a few neighbours at any given time. But, for an election, you’ll need to know about everyone in the network. And even in a gossip system where nodes do eventually learn about the whole network, there’s going to be a startup period where the new node does know about everyone. Hmm, but I guess the election process might be able to handle this if it can gather together all the partial knowledge about who’s in the election. Ah, more detailed reading required!

Surely this system already exists? Well, the closest I could find was GEMS which doesn’t appear to handle the single-notification problem. All very interesting stuff though!

Tags gossip, network

General

Why do they call it: Referentially transparent (II)

Post author By Andrew
Post date 11/12/2008
5 Comments on Why do they call it: Referentially transparent (II)

In the previous post, I explored the idea that some sentences change meaning if you switch between different names for the same object. Some other sentences don’t change meaning – these ones are referentially transparent, which in non-posh-speak means “you don’t notice if I switch to a different name”.

I’ve now asked some questions on haskell-cafe, and consulted various gurus at my work so I’m a little less confused that I was yesterday. 🙂

How does this concept apply to programming languages?

In haskell, you could say that the “names” were expressions like “2+2” and “sqrt(9)” and the underlying ‘thing’ which was being named are the values which those expressions evaluate to. So the integer “two” has many names – “1+1” and “sqrt(4)” and many others.

That gives us a nice connection back to the Quine definition of “referentially transparent”. Essentially, when we claim that Haskell is RT, its just a shorthand way of saying “you can replace an expression with any other expression that evaluates to the same result, and it won’t affect the overall meaning of the program”.

However, I feel that I’ve just handwaved over a few interesting points.

It’s reasonable to think of “sqrt(4)” as being a name for “2” in haskell because, well, functions always return the same value in haskell. But in Java or C, what kind of name could “rand()” be? The function evaluates to various values and it does this because it uses some kind of additional runtime context/state. So the basic idea that you can always have “names” and underlying “things” is a bit shaky – and we can’t really use Quine’s definition without them. What does it mean to say that C++ is “refentially opaque” if the very idea of names/references is shaky?

The same problem applies to the english-language examples from yesterday. I claimed that “Scotlands capital” was another name for edinburgh-the-city. That’s certainly true today. But it might not be true in 100 years. And perhaps, someone who lives in Glasgow might even disagree that its true today! So names which are co-referent today might not always be co-referent.

There’s a famous-ish example that’s similar too. In the sentence “The King of France is bald”, what kind of thing is being named by ‘The King of France’, given that France is a republic. That’s the kind of thing which keeps philosophers awake at night.

And if you’re into books, you’ll know that the author “Iain M Banks” writes sci-fi but “Iain Banks” writes mainstream fiction. There’s only one human being, but he writes under two “personas”. This makes me think that you have to be pretty careful about the THINGS that you think you are NAMING. In this example, “Iain M Banks” and “Iain Banks” are coreferent if you are discuss the human being, but they are not coreferent if you are discussing his “writing personas”.

There was one more point that I handwaved over. Why do I believe that Haskell really is referentially transparent? I might not trust Simon Peyton Jones when he claims that it is! But to prove some property of a language, you need a formal semantics for the language. And haskell98 doesn’t have that (unlike Standard ML). So there’s no much hope of doing a bottom-up inductive proof over the structure of the language – not just yet anyway. That’s a pity.

I’ll finish off with a couple of links about what RT means and RT and haskell.

Now I’m going to return to finish watching the SICP video which triggered this whole day-long distraction. 🙂

Tags etymology, haskell, theory