Categories
General

Hacking with HAppS

I’m always on the lookout for more fun ways of programming – I hate doing dull boilerplate work. So I’ve been trying out HAppS, a webserver framework for Haskell. It’s interestingly different from most web frameworks. I want to capture my early observations before I forget them. To give some context, my “toy app” I’m building is an “IMDB for Computer Scientists” type website which gives structured information about people (eg. Guy Steele), what papers they’ve published (eg. Growing a Language), what events they’ve participated in (eg. a talk about “Growing a Language” at OOPSLA, available via Google video).

So the biggest thing about HAppS it that it doesn’t have to use a database to persist state. It takes the Prevayler approach, which means all your data lives in RAM as normal in-memory data structures. When the data is about to be modified, a delta is first persisted to a fast log on disk (giving durability) and then the in-memory version is updated. Periodically, checkpoints are taken.

This is a huge win because it means you avoid the whole tedious ‘impedance mismatch’ involved with persisting to a database – O/R mappings, mismatches between DB types and language types. It means that you can create new datatypes easily, and get on with the real business of adding functionality rather than futzing with databases.

Of course, there are downsides to this approach which I will discuss next. But first, the big upside: I can develop features quicker. Life being what it is, I’m almost certainly making lots of mistakes whilst building my app. Without a DB layer to burn time on, I can find out that I’m (inevitably) building the Wrong Thing sooner and make a course correction.

Downside #1: you normally rely on a database to get fast indexed lookups. With HAppS, you do this by constructing an in-memory hash table to act as an index. Actually, that’d be quite boring to do by hand so HApps uses metaprogramming (template haskell) to build the index data structures for you. They are moderately smart, with lazy update strategies.

Downside #2: scalability and availability. A big benefit of the usual “stateless web frontend, stateful database” split is that you can trivially scale the web fleet (no state) and fairly easily scale the database (replication/partitioning). With the HAppS approach it sounds like you can only have a single webserver/state-store machine – which means no horizontal scalability and bad availability. Actually, that’s not the whole story – the HAppS team are working on a system whereby the state gets shared across multiple machines (permitting scalable reads) and sharded (each machine is the master for some subset of the state).

That should be a killer downside, right? The argument is looks like this: “all successful webapps are popular, and popular apps must scale”. However, I think that to become a successful webapp, you have to firstly be a webapp. My “projects” directory is littered with half-finished projects where I got bored before I finished it. Quite frankly, if using HAppS means that I can avoid a lot of tedious work and actually finish a project then I’ll happily embrace the ghost of future scaling pain. Also, not every successful website gets as much traffic as Flickr/Twitter – my toy app is an “IMDB for Computer Scientists” and there just aren’t that many computer scientists in the world. Not every website needs to be super-scalable from day one. I’m not being naive here – my dayjob is scalability central – but it’s a question of finding the most appropriate hammer for the particular nail you’re trying to hit.

Having said all that, let me qualify it a bit: I would ideally like to create a perfectly scalable webapp from day one if there was no overhead in doing so. Most of the industry is focused on scalability-via-database, and attempt to minimize tedious work in various ways (eg. metaprogramming and reflection in ActiveRecord). However, these invariably ends up being painfully ‘leaky’ abstractions. I’ve seen this many times, and it’s given me the motivation to look for other approaches which can avoid the persistence pain. So perhaps HAppS is the answer, especially if my question is “can I write a reasonable webapp quickly” rather than “can I write a superscalable website slowly?”.

Disadvantage #3: Data migrations. This is really a red herring. I don’t really believe that data migrations are an advantage of having your data in a database. If you are using an O/R mapping, you surely want to migrate old objects to new object, not old relations to new relations. The approach which HAppS takes is to version your datatypes (mostly done via metaprogramming) and you write haskell code to migrate from old versions.

Disadvantage #4: Language dependence. If you use a database, you can access your data from any language. You can produce reports, do adhoc SQL queries and such like. These are indeed useful; I do this a lot. However, language-independence comes at a cost (impedance mismatch between database/language data types, business rules usually not applied at the database layer). Often, language independent access can be better provided by service layers which wrap the database layer, providing insulation from the schema changes and allowing fine grained access control and throttling. I’ve never been able to query Flickr’s database directly, but I’ve accessed their data via their API many times. HAppS provides more metaprogramming support for exposing data types via XML.

So that’s why I’m investing the time into learning about HAppS. It’s good to challenge your assumptions about how to do things. In the next post, I’ll write a bit about how HAppS works under the hood.

3 replies on “Hacking with HAppS”

Thank you for sharing your observations. Looking forward to reading the next post!

‘A big benefit of the usual “stateless web frontend, stateful database” split is that you can easily trivially the web fleet (no state)’

trivially what the web fleet?

gwern: Oops, I meant “trivially scale the web fleet”. Buy more webservers, stick HAProxy on front of them to spread the requests between the boxes.

Comments are closed.