Technical debt (or, mortgages in Haskell)

I recently got fed up trying to understand my mortgage using excel. After twenty minutes guddling with cells and individual values, I felt the need to create higher-level abstractions such as “mortgage” and “payment strategy”. I also wanted to create a list of possible repayment strategies and easily compare them to see how it affects the loan duration and total interest payed. This is possible in excel, but no fun.

So, fast-forward to the end of an evening’s hacking with Haskell. I now have hmortgage, a EDSL for expressing payment strategies and code which will expand out a mortgage into monthly steps, like this:

We are looking at loan of £1000.00 at 5.0% over 10y, which has required monthly payment of £10.58
Baseline:
        Total interest: £272.97 Total payments: £1272.97 Duration=10y 1m
Overpayment scenario "2 pm, 200 initial":
        Total interest: £132.09 Total payments: £1132.09 Duration=6y 3m
        Compared to baseline: interest=£-140.88, payments=£-140.88, duration=-3y 10m
For month 1, balance: £1000.00 -> £791.58       (interest: £4.16, payment: £212.58)
For month 2, balance: £791.58 -> £782.29        (interest: £3.29, payment: £12.58)
For month 3, balance: £782.29 -> £772.96        (interest: £3.25, payment: £12.58)
For month 4, balance: £772.96 -> £763.60        (interest: £3.22, payment: £12.58)

ie. if you overpay by £2 each month, and pay an initial lump sum of £200, you’ll save about £140 overall and will repay the mortgage nearly 4 years early.

There’s a few points of haskelly interest in this code, mostly inspired by stuff I read a few years ago – behaviors in FRP, and SPJ’s “composing contracts” paper.

Combinators for payment strategies

I have a few primitive payment strategies, which can be combined into more complex strategies:

  • monthlyPaymentsOf (100 Pounds)
  • lumpSumOf (100 Pounds)
  • lumpSumOf (100 Pounds) `after` (1 Year)
  • monthlyPaymentsOf (100 Pound) +. (lumpSumOf (100 Pounds) `after` (1 Year))

Shallow embedding of DSL

The dsl is a shallow embedding; it represents the monthly payment plan as a function from month-number to the payment amount, ie. Integer -> Currency. There’s a problem with this approach – the only thing you can do with a function is apply it to some arguments. This is fine for finding the payment for a particular month, but I would also like to derive a textual description of the payment plan – which isn’t possible with functions.

From stuff I’ve read previously, I think my two options are:

  1. Lisp-like: Represent the payment schedule as data (ie. like an AST) and provide an eval function. This allows introspection into structure of the payment schedule. Code is data, data is code.
  2. Arrow-like: The payment strategy could be a tuple of the function and a textual description. When strategies are combined, the combinator would merge the textual descriptions as well as producing new combined functions. I’m not totally convinced that the english language is ‘compositional’ in this way though – it might end up with really clumsy phrasing.

Crazy Lennart-inspired postfix operators

Initially, the only way I had to create a ‘Currency’ value was via the ‘pounds’ function. In haskell, the function precedes the argument, hence it looks like “pounds 20”. The source code would read nicer if I could write this as “20 pounds” like we do in english. I didn’t think this was possible in Haskell.

Then I remembered seeing Lennart Augustsson’s crazy embedding of BASIC into Haskell. In particular, he had code which looked like this:

runBasic $ do
  10 PRINT "HELLO"
  20 END

How the heck does that parse? It’s using ‘do’ notation, so “20 END” must have a type in the Monad class. But, as I understood things, “x y” means “apply the (function) value x to value y”. And “20” doesn’t look much like a function to me.

Digging into the source, I found this:

-- 10 END
instance Num (END -> Expr a) where
    fromInteger i c = ...

Hmm, interesting. This is saying that (some) function type can be treated as if it is “number like” and provides a mechanism for converting integer literals in source code to that type. I hadn’t fully appreciated this, but the Haskell Report says that numeric literals aren’t quite as literal as I expected – the literal integer value gets passed through ‘fromInteger’ and can therefore be made into any Numeric type.

So this code really says “Hey ghc, if you come across a “42” in the source code, you can turn that into a function if you need to”. In the BASIC example, the next thing on line 20 is “END”, a constructor for the type also called END. So, ghc will be looking to turn “42” into something that can be used as a function taking an argument of type END, and so it’ll call this instance of fromInteger.

Hurrah, I can use the same ‘trick’ to make my currencies look nicer:

data MONEY = Pounds | Pence

instance Num (MONEY -> Currency) where
  fromInteger i Pounds = C (i * 100)
  fromInteger i Pence = C i

Now I can say “42 Pounds” or “23 Pence”. The “42” will become a function with type MONEY -> Currency. The “MONEY” type is really just a tag – used to choose the parse but that’s it. The Pounds/Pence tags force the appropriate overloading of fromInteger to be chosen, and this will construct a Currency value (represented as number of pence, and using a simple wrapper constructor called C).

Is this better, or just “clever”? I’m not sure yet. It’s certainly easier to read. But I feel I’ve taken a step away from “pure haskell” into a slightly weird world. Still, if I were writing in lisp, I wouldn’t think twice about doing this kind of thing.

The actual app

Shocker, I’ve produced an app which is actually useful to me in “teh real world”. I have a big TODO list of stuff which will fit nicely into the app – time-varying interest rates, inflation predictions and NPV calculations. None of which, of course, I will ever actually get around to adding. But it’s still useful in its present state, so a win!

Here’s what the “summary” view says – it omits the montly breakdown and instead reports the overall savings possible via the different payment strategies:

We are looking at loan of £1000.00 at 5.0% over 10y, which has required monthly payment of £10.58
Baseline:
        Total interest: £272.97 Total payments: £1272.97 Duration=10y 1m
Overpayment scenario "2 pm, 200 initial":
        Total interest: £132.09 Total payments: £1132.09 Duration=6y 3m
        Compared to baseline: interest=£-140.88, payments=£-140.88, duration=-3y 10m
Overpayment scenario "2 pm only":
        Total interest: £216.50 Total payments: £1216.50 Duration=8y 1m
        Compared to baseline: interest=£-56.47, payments=£-56.47, duration=-2y
Overpayment scenario "200 initial":
        Total interest: £163.52 Total payments: £1163.52 Duration=7y 8m
        Compared to baseline: interest=£-109.45, payments=£-109.45, duration=-2y 5m
Overpayment scenario "400 initial":
        Total interest: £87.73 Total payments: £1087.73 Duration=5y 6m
        Compared to baseline: interest=£-185.24, payments=£-185.24, duration=-4y 7m
Overpayment scenario "200 after 2y":
        Total interest: £191.42 Total payments: £1191.42 Duration=7y 10m
        Compared to baseline: interest=£-81.55, payments=£-81.55, duration=-2y 3m
Overpayment scenario "400 after 2y":
        Total interest: £137.90 Total payments: £1137.90 Duration=5y 10m
        Compared to baseline: interest=£-135.07, payments=£-135.07, duration=-4y 3m

Eep, it’s 01:30 .. how did that happen? Stoopid jetlag …

HAppS-state mistake

I’m grappling with HAppS-State at the moment, and thought it useful to capture some work-in-progress notes. My toy webapp allows you to view and edit information about people, places and things. The webapp state just consists of several identifier->entity maps.

HAppS-state requires that you write your state query/update functions as normal MonadState or MonadReader computations. But you also must process each of these functions using the mkMethods template haskell function. This generates some “behind the scenes” machinery to turn your vanilla state-updating monads into something which additionally maintains a write-ahead disk log to make the change durable. If your update function was called “modifyPersonName”, the call to mkMethod generates a datatype/constructor called ModifyPersonName which, when used like “update ModifyPersonName newName” has the richer durable behaviour.

I have lots of different entities, and they all have lots of different attributes. It quickly gets boring writing seperate “modifyEntityX” functions for each attribute. Haskell’s rather lousy record syntax doesn’t help out much either.

Fortunately, there’s a nice library called data-accessor which provides a more pleasant way to handle haskell record types. The idea is that you build up a getter/setter pair for each record member. These are first class values, and are consequently much more flexible than the builtin haskell record update syntax.

This seemed to be the answer to my problem – I can make a generic “modifyPersonAttribute” function which takes one of these accessors as an argument in order to select the field to update.

Unfortunately, this doesn’t work. I get a type error effectively stating that happs-state requires that all of the arguments to update/query function must themselves be Serializable.

This confused me. I can see that the application state type (and all of its constituent subparts) need to be serializable. But I was surprised that all the arguments for state-updating functions needed to be Serializable.

Then I realized what my false assumption was. I had assumed that happs was persisting the result of running the update operation to the logfile, similar to what mysql does for redo logs. In other words, I thought the logfile consisted of things like “the new value for row 42 is ‘foo'”.

However, a quick look at the contents of the _local directory (where happs stores its state) shows that this isn’t the case. Happs stores a description of the computation itself – ie. the name of the update operation and the (serialized) arguments it took.

This has got me somewhat stuck. Firstly, my generic ‘modifyPersonAttribute’ doesn’t work because the “accessor” values are not serializable. I’m now wondering if perhaps I can bypass data-accessors and instead write some template haskell to generate the happs machinery for all my entity types and all their attribute values.

But more importantly, this means that you need to be super-careful not to change the behavior of your state-modifying functions if there are any uncheckpointed changes in the logfile. Let’s say you have a createPerson function which takes a name and stores the name straight into the application state. But some days later, you decide that you want to make names have an initial capital letter before storing them. You change the code and restart the application – but unless you were careful to checkpoint the application state, the log will be replayed and you’ll end up with a different application state from before (some existing people will have the initial-caps logic applied to their name, not just new people).

Hacking with HAppS – what type, handler?

(These posts are prettified versions of my notes that I made whilst stumbling around HAppS for the first time. I hope they’ll be of use to future HAppS explorers!)

Note 1 .. In which we figure out what ServerPartT and WebT really are.

When we start HAppS running with ..

   simpleHTTP config list_of_handlers :: IO ()

… what type do the handlers have? Let’s explore some possible universes!

In a simple world, a handler could have type

    Request -> Maybe Response

This allows handlers to be selective about which Requests they handle, which is useful. But it also means that handlers must be pure functions so there’d be no way of maintaining any ‘application state’ between requests. Not very useful!

Let’s try to remedy that. If the handlers had type:

   Request -> State S Response

.. then we’d be able to retain and modify some application state (of type S) between handler invocations. But that’s all we’d be able to do. We still couldn’t write logs to disk, read file contents or talk to databases.

To allow handlers to do that, handlers would need to have type ..

  Request -> IO Response

(As an aside, note that whilst simpleHTTP has type “IO ()” it’s up to simpleHTTP whether or not it exposes the full power of the IO monad to the handlers. However, as we’ve just seen, not doing so would pretty much cripple the handlers)

So is this really what HAppS gives you? Let’s see. In the real world HAppS handlers have type “ServerPartT IO a” which is defined to be …

newtype ServerPartT m a = ServerPartT { unServerPartT :: Request -> WebT m a }

In other words, if you have a value of type “Request -> WebT m a”, you can tag it with the ServerPartT constructor to say “this ain’t just any function of that type, it’s part of a web server, dogammit!”.

But, apart from the tag, our handlers really just have type “Request -> WebT m a”. So what’s a WebT?

newtype WebT m a = WebT { unWebT :: m (Result a) }

So, ignoring the tag, it looks kinda like our handlers can produce any monadic computation that, when run, yields a “Result a” (the “Result a” type lets us say “I handled this request, here’s my answer” or “I didn’t handle this request”)

Hmm, there’s something not right there. If that was the case, it would be possible to use, for example, both “Result -> IO (Result a)” and “Result -> State S (Result a)” as handlers. But how would simpleHTTP ‘run’ such monadic computations? To run a State computation you use “evalState”. To run an IO computation, you have to return it from main. There’s no single way to run all possible monads.

Did I miss something? Yes, sort of – I just blindly assumed ‘m’ was a monad because of the way it was used. But there’s no “Monad m =>” constraints anywhere. And, in fact, if we go back to the start ..

simpleHTTP :: ToMessage a => Conf -> [ServerPartT IO a] -> IO ()

.. we can see that, right from the top, simpleHTTP only deals with ServerPartT’s and WebT’s where the ‘m’ type parameter is IO.

So, after all that, the handlers in HAppS handlers can be anything of type: Request -> IO (Response a). The ‘a’ type needs to be something that HApps can figure out a content-type and be able to make an HTTP message from (the ToMessage type class). The “IO (Response a)” part gets dressed up as a “WebT” (think: computation that produces the response). And the whole thing things gets dressed up as a “ServerPartT” (think: request handler).

What does each bit do? The IO part gives you free reign to do side-effecting computations, which get sequentially executed by the runtime. The Response part allows you to say ‘I handled this’ or ‘didn’t handle it’ and the ‘a’ part can be your XML or HTML data type (so long as its an instance of ToMessage).

So that’s it.

Man, that was a lot of complexity for such a simple end result. I don’t understand why there’s so much generality in the types of ServerT and WebT – I’d love to know!

Hacking with HAppS

I’m always on the lookout for more fun ways of programming – I hate doing dull boilerplate work. So I’ve been trying out HAppS, a webserver framework for Haskell. It’s interestingly different from most web frameworks. I want to capture my early observations before I forget them. To give some context, my “toy app” I’m building is an “IMDB for Computer Scientists” type website which gives structured information about people (eg. Guy Steele), what papers they’ve published (eg. Growing a Language), what events they’ve participated in (eg. a talk about “Growing a Language” at OOPSLA, available via Google video).

So the biggest thing about HAppS it that it doesn’t have to use a database to persist state. It takes the Prevayler approach, which means all your data lives in RAM as normal in-memory data structures. When the data is about to be modified, a delta is first persisted to a fast log on disk (giving durability) and then the in-memory version is updated. Periodically, checkpoints are taken.

This is a huge win because it means you avoid the whole tedious ‘impedance mismatch’ involved with persisting to a database – O/R mappings, mismatches between DB types and language types. It means that you can create new datatypes easily, and get on with the real business of adding functionality rather than futzing with databases.

Of course, there are downsides to this approach which I will discuss next. But first, the big upside: I can develop features quicker. Life being what it is, I’m almost certainly making lots of mistakes whilst building my app. Without a DB layer to burn time on, I can find out that I’m (inevitably) building the Wrong Thing sooner and make a course correction.

Downside #1: you normally rely on a database to get fast indexed lookups. With HAppS, you do this by constructing an in-memory hash table to act as an index. Actually, that’d be quite boring to do by hand so HApps uses metaprogramming (template haskell) to build the index data structures for you. They are moderately smart, with lazy update strategies.

Downside #2: scalability and availability. A big benefit of the usual “stateless web frontend, stateful database” split is that you can trivially scale the web fleet (no state) and fairly easily scale the database (replication/partitioning). With the HAppS approach it sounds like you can only have a single webserver/state-store machine – which means no horizontal scalability and bad availability. Actually, that’s not the whole story – the HAppS team are working on a system whereby the state gets shared across multiple machines (permitting scalable reads) and sharded (each machine is the master for some subset of the state).

That should be a killer downside, right? The argument is looks like this: “all successful webapps are popular, and popular apps must scale”. However, I think that to become a successful webapp, you have to firstly be a webapp. My “projects” directory is littered with half-finished projects where I got bored before I finished it. Quite frankly, if using HAppS means that I can avoid a lot of tedious work and actually finish a project then I’ll happily embrace the ghost of future scaling pain. Also, not every successful website gets as much traffic as Flickr/Twitter – my toy app is an “IMDB for Computer Scientists” and there just aren’t that many computer scientists in the world. Not every website needs to be super-scalable from day one. I’m not being naive here – my dayjob is scalability central – but it’s a question of finding the most appropriate hammer for the particular nail you’re trying to hit.

Having said all that, let me qualify it a bit: I would ideally like to create a perfectly scalable webapp from day one if there was no overhead in doing so. Most of the industry is focused on scalability-via-database, and attempt to minimize tedious work in various ways (eg. metaprogramming and reflection in ActiveRecord). However, these invariably ends up being painfully ‘leaky’ abstractions. I’ve seen this many times, and it’s given me the motivation to look for other approaches which can avoid the persistence pain. So perhaps HAppS is the answer, especially if my question is “can I write a reasonable webapp quickly” rather than “can I write a superscalable website slowly?”.

Disadvantage #3: Data migrations. This is really a red herring. I don’t really believe that data migrations are an advantage of having your data in a database. If you are using an O/R mapping, you surely want to migrate old objects to new object, not old relations to new relations. The approach which HAppS takes is to version your datatypes (mostly done via metaprogramming) and you write haskell code to migrate from old versions.

Disadvantage #4: Language dependence. If you use a database, you can access your data from any language. You can produce reports, do adhoc SQL queries and such like. These are indeed useful; I do this a lot. However, language-independence comes at a cost (impedance mismatch between database/language data types, business rules usually not applied at the database layer). Often, language independent access can be better provided by service layers which wrap the database layer, providing insulation from the schema changes and allowing fine grained access control and throttling. I’ve never been able to query Flickr’s database directly, but I’ve accessed their data via their API many times. HAppS provides more metaprogramming support for exposing data types via XML.

So that’s why I’m investing the time into learning about HAppS. It’s good to challenge your assumptions about how to do things. In the next post, I’ll write a bit about how HAppS works under the hood.

Why do they call it: Referentially transparent (II)

In the previous post, I explored the idea that some sentences change meaning if you switch between different names for the same object. Some other sentences don’t change meaning – these ones are referentially transparent, which in non-posh-speak means “you don’t notice if I switch to a different name”.

I’ve now asked some questions on haskell-cafe, and consulted various gurus at my work so I’m a little less confused that I was yesterday. 🙂

How does this concept apply to programming languages?

In haskell, you could say that the “names” were expressions like “2+2” and “sqrt(9)” and the underlying ‘thing’ which was being named are the values which those expressions evaluate to. So the integer “two” has many names – “1+1” and “sqrt(4)” and many others.

That gives us a nice connection back to the Quine definition of “referentially transparent”. Essentially, when we claim that Haskell is RT, its just a shorthand way of saying “you can replace an expression with any other expression that evaluates to the same result, and it won’t affect the overall meaning of the program”.

However, I feel that I’ve just handwaved over a few interesting points.

It’s reasonable to think of “sqrt(4)” as being a name for “2” in haskell because, well, functions always return the same value in haskell. But in Java or C, what kind of name could “rand()” be? The function evaluates to various values and it does this because it uses some kind of additional runtime context/state. So the basic idea that you can always have “names” and underlying “things” is a bit shaky – and we can’t really use Quine’s definition without them. What does it mean to say that C++ is “refentially opaque” if the very idea of names/references is shaky?

The same problem applies to the english-language examples from yesterday. I claimed that “Scotlands capital” was another name for edinburgh-the-city. That’s certainly true today. But it might not be true in 100 years. And perhaps, someone who lives in Glasgow might even disagree that its true today! So names which are co-referent today might not always be co-referent.

There’s a famous-ish example that’s similar too. In the sentence “The King of France is bald”, what kind of thing is being named by ‘The King of France’, given that France is a republic. That’s the kind of thing which keeps philosophers awake at night.

And if you’re into books, you’ll know that the author “Iain M Banks” writes sci-fi but “Iain Banks” writes mainstream fiction. There’s only one human being, but he writes under two “personas”. This makes me think that you have to be pretty careful about the THINGS that you think you are NAMING. In this example, “Iain M Banks” and “Iain Banks” are coreferent if you are discuss the human being, but they are not coreferent if you are discussing his “writing personas”.

There was one more point that I handwaved over. Why do I believe that Haskell really is referentially transparent? I might not trust Simon Peyton Jones when he claims that it is! But to prove some property of a language, you need a formal semantics for the language. And haskell98 doesn’t have that (unlike Standard ML). So there’s no much hope of doing a bottom-up inductive proof over the structure of the language – not just yet anyway. That’s a pity.

I’ll finish off with a couple of links about what RT means and RT and haskell.

Now I’m going to return to finish watching the SICP video which triggered this whole day-long distraction. 🙂