Technical debt (or, mortgages in Haskell)

I recently got fed up trying to understand my mortgage using excel. After twenty minutes guddling with cells and individual values, I felt the need to create higher-level abstractions such as “mortgage” and “payment strategy”. I also wanted to create a list of possible repayment strategies and easily compare them to see how it affects the loan duration and total interest payed. This is possible in excel, but no fun.

So, fast-forward to the end of an evening’s hacking with Haskell. I now have hmortgage, a EDSL for expressing payment strategies and code which will expand out a mortgage into monthly steps, like this:

We are looking at loan of £1000.00 at 5.0% over 10y, which has required monthly payment of £10.58
Baseline:
        Total interest: £272.97 Total payments: £1272.97 Duration=10y 1m
Overpayment scenario "2 pm, 200 initial":
        Total interest: £132.09 Total payments: £1132.09 Duration=6y 3m
        Compared to baseline: interest=£-140.88, payments=£-140.88, duration=-3y 10m
For month 1, balance: £1000.00 -> £791.58       (interest: £4.16, payment: £212.58)
For month 2, balance: £791.58 -> £782.29        (interest: £3.29, payment: £12.58)
For month 3, balance: £782.29 -> £772.96        (interest: £3.25, payment: £12.58)
For month 4, balance: £772.96 -> £763.60        (interest: £3.22, payment: £12.58)

ie. if you overpay by £2 each month, and pay an initial lump sum of £200, you’ll save about £140 overall and will repay the mortgage nearly 4 years early.

There’s a few points of haskelly interest in this code, mostly inspired by stuff I read a few years ago – behaviors in FRP, and SPJ’s “composing contracts” paper.

Combinators for payment strategies

I have a few primitive payment strategies, which can be combined into more complex strategies:

  • monthlyPaymentsOf (100 Pounds)
  • lumpSumOf (100 Pounds)
  • lumpSumOf (100 Pounds) `after` (1 Year)
  • monthlyPaymentsOf (100 Pound) +. (lumpSumOf (100 Pounds) `after` (1 Year))

Shallow embedding of DSL

The dsl is a shallow embedding; it represents the monthly payment plan as a function from month-number to the payment amount, ie. Integer -> Currency. There’s a problem with this approach – the only thing you can do with a function is apply it to some arguments. This is fine for finding the payment for a particular month, but I would also like to derive a textual description of the payment plan – which isn’t possible with functions.

From stuff I’ve read previously, I think my two options are:

  1. Lisp-like: Represent the payment schedule as data (ie. like an AST) and provide an eval function. This allows introspection into structure of the payment schedule. Code is data, data is code.
  2. Arrow-like: The payment strategy could be a tuple of the function and a textual description. When strategies are combined, the combinator would merge the textual descriptions as well as producing new combined functions. I’m not totally convinced that the english language is ‘compositional’ in this way though – it might end up with really clumsy phrasing.

Crazy Lennart-inspired postfix operators

Initially, the only way I had to create a ‘Currency’ value was via the ‘pounds’ function. In haskell, the function precedes the argument, hence it looks like “pounds 20”. The source code would read nicer if I could write this as “20 pounds” like we do in english. I didn’t think this was possible in Haskell.

Then I remembered seeing Lennart Augustsson’s crazy embedding of BASIC into Haskell. In particular, he had code which looked like this:

runBasic $ do
  10 PRINT "HELLO"
  20 END

How the heck does that parse? It’s using ‘do’ notation, so “20 END” must have a type in the Monad class. But, as I understood things, “x y” means “apply the (function) value x to value y”. And “20” doesn’t look much like a function to me.

Digging into the source, I found this:

-- 10 END
instance Num (END -> Expr a) where
    fromInteger i c = ...

Hmm, interesting. This is saying that (some) function type can be treated as if it is “number like” and provides a mechanism for converting integer literals in source code to that type. I hadn’t fully appreciated this, but the Haskell Report says that numeric literals aren’t quite as literal as I expected – the literal integer value gets passed through ‘fromInteger’ and can therefore be made into any Numeric type.

So this code really says “Hey ghc, if you come across a “42” in the source code, you can turn that into a function if you need to”. In the BASIC example, the next thing on line 20 is “END”, a constructor for the type also called END. So, ghc will be looking to turn “42” into something that can be used as a function taking an argument of type END, and so it’ll call this instance of fromInteger.

Hurrah, I can use the same ‘trick’ to make my currencies look nicer:

data MONEY = Pounds | Pence

instance Num (MONEY -> Currency) where
  fromInteger i Pounds = C (i * 100)
  fromInteger i Pence = C i

Now I can say “42 Pounds” or “23 Pence”. The “42” will become a function with type MONEY -> Currency. The “MONEY” type is really just a tag – used to choose the parse but that’s it. The Pounds/Pence tags force the appropriate overloading of fromInteger to be chosen, and this will construct a Currency value (represented as number of pence, and using a simple wrapper constructor called C).

Is this better, or just “clever”? I’m not sure yet. It’s certainly easier to read. But I feel I’ve taken a step away from “pure haskell” into a slightly weird world. Still, if I were writing in lisp, I wouldn’t think twice about doing this kind of thing.

The actual app

Shocker, I’ve produced an app which is actually useful to me in “teh real world”. I have a big TODO list of stuff which will fit nicely into the app – time-varying interest rates, inflation predictions and NPV calculations. None of which, of course, I will ever actually get around to adding. But it’s still useful in its present state, so a win!

Here’s what the “summary” view says – it omits the montly breakdown and instead reports the overall savings possible via the different payment strategies:

We are looking at loan of £1000.00 at 5.0% over 10y, which has required monthly payment of £10.58
Baseline:
        Total interest: £272.97 Total payments: £1272.97 Duration=10y 1m
Overpayment scenario "2 pm, 200 initial":
        Total interest: £132.09 Total payments: £1132.09 Duration=6y 3m
        Compared to baseline: interest=£-140.88, payments=£-140.88, duration=-3y 10m
Overpayment scenario "2 pm only":
        Total interest: £216.50 Total payments: £1216.50 Duration=8y 1m
        Compared to baseline: interest=£-56.47, payments=£-56.47, duration=-2y
Overpayment scenario "200 initial":
        Total interest: £163.52 Total payments: £1163.52 Duration=7y 8m
        Compared to baseline: interest=£-109.45, payments=£-109.45, duration=-2y 5m
Overpayment scenario "400 initial":
        Total interest: £87.73 Total payments: £1087.73 Duration=5y 6m
        Compared to baseline: interest=£-185.24, payments=£-185.24, duration=-4y 7m
Overpayment scenario "200 after 2y":
        Total interest: £191.42 Total payments: £1191.42 Duration=7y 10m
        Compared to baseline: interest=£-81.55, payments=£-81.55, duration=-2y 3m
Overpayment scenario "400 after 2y":
        Total interest: £137.90 Total payments: £1137.90 Duration=5y 10m
        Compared to baseline: interest=£-135.07, payments=£-135.07, duration=-4y 3m

Eep, it’s 01:30 .. how did that happen? Stoopid jetlag …

HAppS-state mistake

I’m grappling with HAppS-State at the moment, and thought it useful to capture some work-in-progress notes. My toy webapp allows you to view and edit information about people, places and things. The webapp state just consists of several identifier->entity maps.

HAppS-state requires that you write your state query/update functions as normal MonadState or MonadReader computations. But you also must process each of these functions using the mkMethods template haskell function. This generates some “behind the scenes” machinery to turn your vanilla state-updating monads into something which additionally maintains a write-ahead disk log to make the change durable. If your update function was called “modifyPersonName”, the call to mkMethod generates a datatype/constructor called ModifyPersonName which, when used like “update ModifyPersonName newName” has the richer durable behaviour.

I have lots of different entities, and they all have lots of different attributes. It quickly gets boring writing seperate “modifyEntityX” functions for each attribute. Haskell’s rather lousy record syntax doesn’t help out much either.

Fortunately, there’s a nice library called data-accessor which provides a more pleasant way to handle haskell record types. The idea is that you build up a getter/setter pair for each record member. These are first class values, and are consequently much more flexible than the builtin haskell record update syntax.

This seemed to be the answer to my problem – I can make a generic “modifyPersonAttribute” function which takes one of these accessors as an argument in order to select the field to update.

Unfortunately, this doesn’t work. I get a type error effectively stating that happs-state requires that all of the arguments to update/query function must themselves be Serializable.

This confused me. I can see that the application state type (and all of its constituent subparts) need to be serializable. But I was surprised that all the arguments for state-updating functions needed to be Serializable.

Then I realized what my false assumption was. I had assumed that happs was persisting the result of running the update operation to the logfile, similar to what mysql does for redo logs. In other words, I thought the logfile consisted of things like “the new value for row 42 is ‘foo'”.

However, a quick look at the contents of the _local directory (where happs stores its state) shows that this isn’t the case. Happs stores a description of the computation itself – ie. the name of the update operation and the (serialized) arguments it took.

This has got me somewhat stuck. Firstly, my generic ‘modifyPersonAttribute’ doesn’t work because the “accessor” values are not serializable. I’m now wondering if perhaps I can bypass data-accessors and instead write some template haskell to generate the happs machinery for all my entity types and all their attribute values.

But more importantly, this means that you need to be super-careful not to change the behavior of your state-modifying functions if there are any uncheckpointed changes in the logfile. Let’s say you have a createPerson function which takes a name and stores the name straight into the application state. But some days later, you decide that you want to make names have an initial capital letter before storing them. You change the code and restart the application – but unless you were careful to checkpoint the application state, the log will be replayed and you’ll end up with a different application state from before (some existing people will have the initial-caps logic applied to their name, not just new people).

Working on remote unix hosts

After much tweaking, I’ve found a good way of working with a set of geographically distributed unix hosts. This isn’t the most exciting topic in the world but, like your choice of keyboard, it affects you every minute you spend working at a computer. So it’s worth some attention.

Every day, I work on localhost and three other (distant) servers. I run gnome-terminal with four tabs, one per host so I can switch between hosts with alt-1-4. Each tab has its own ‘profile’ which I use to set a slight background tint as a visual reminder of which host I’m working on, as well as showing the hostname in the tab title. Why gnome-terminal? Well, because it does unicode right and has tabs – it’s dog slow at rending though. I tried urxvt for a while, but went back to gnome-terminal.

In each tab, I run “ssh -tt HOST screen -DR” to login to the remote host and reconnect to my GNU screen session. This gives all of the win. Firstly, it makes it easy to start new shells on that host without the overhead of a new ssh login. When I say ‘overhead’ I mean both time to do ssh connection negotiation (these are distant hosts) and the niggling asymmetry of ControlMaster. Using screen inside gnome-terminal effectively gives me two dimensions of tabs (and two sets of keybindings) but it works well.

The second win of screen is that if my internet connection goes down or my localhost crashes, I don’t lose my state. Any long running jobs on the remote hosts are still running just fine when I log back in and reconnect to screen. I can also connect to the screen session from different places. For example, I can leave a long job running whilst I cycle home and check on it from home.

The last piece of the puzzle is emacs. I love emacs, and the recent multitty support is just awesome. However, it’s a bit of a pain having to have my ‘actual’ emacs running on a real tty, when 99% of the time I ‘use’ it via emacsclient -t. However, I recently started running emacs under detachtty which allows you to run the main emacs process ‘headless’. I also saw a patch to do the same thing direct in emacs. So now I have a ‘headless’ emacs running on each host 24/7. Then, when I want to edit something I used ’emacsclient -t’ to temporarily connect my current terminal to it. And when I’m done, C-x C-c disconnects from emacs but doesn’t actually kill it. So my emacs now acts like a zero-startup time lightweight editor, but I get all the advantages of a having long-running emacs process. And I don’t have to worry about accidentally closing the window which the ‘real’ emacs is running in. Sweet.

It occurs to me that there’s a lot of duplication in the setup. Screen and detachtty and emacs have overlapping features in numerous ways. Emacs/screen can manage multiple shells, screen/detachtty do the ‘tty decoupling’ thing. But it’d take work to make emacs manage multiple shells as nicely as screen does. And screen-inside-gnometerminal is easier to manage than remote-screen-inside-local-screen. I think I’ve got to a pretty sweet spot with this setup.

Squawk (simple queues using awk)

If you are easily offended, look away now …

Reliable message queues (ActiveMQ in particular) are pretty handy things. They make it a lot easier to build reliable systems which are able to network problems, hardware trouble and temporary weirdness. However, they always feel pretty heavyweight; suitable for “enterprise systems” but not quick shell scripts.

Well, let’s fix that. My aim is publish and receive messages to an ActiveMQ broker from the unix shell with a minimum of overhead. I want to have a ‘consume’ script which reads messages from a queue and pipes them to a handler. If the handler script succeeds, the message is acknowledged and we win. If the handler script fails, the message is returned back to the queue, and can be re-tried later (possibly by a different host).

STOMP is what makes this easy. It’s a ‘simple text-oriented message protocol’ which is supported directly by ActiveMQ. So we won’t need to mess around with weighty client libraries. A good start.

But we still need to write a ‘consume’ program which will speak STOMP and invoke the message handler script. There are existing STOMP bindings for perl and ruby, but I’m pitching for a pure unix solution.

In STOMP, messages are NUL separated which made me wonder if it’d be possible to use awk, by setting its ‘record separator’ to NUL. The short answer is: yes, awk can do reliable messaging – win!

We’ll need some network glue. Recent versions of awk have builtin network support, but I’m going to use netcat because it’s more common than bleeding-edge awks.

I also want to keep ‘consume’ to be a single file, but I don’t want to pull my hair out trying to escape everything properly. So, I’ll use a bash here document to write the awk script out to a temporary file before invoking awk. (is there a nicer way to do this?)

There’s not much more to say except here’s the scripts: consume and produce.

To try it out, you’ll need to download ActiveMQ and start it up; just do ./bin/activemq and you’ll get a broker which has a stomp listener on port 61613.

To publish to a queue, run: echo ‘my message’ | ./produce localhost 61613 /queue/a

To consume, first write a message handler, such as:

#!/bin/bash
echo Handling a message at $(date).  Message follows:
cat
echo '(message ends)'
exit 0

and then run: ./consume localhost 61613 /queue/a ./myhandler.

To simulate failure, change the handler to “exit 1”. The message will be returned to the queue. By default, the consumer will then immediately try again, so I added in a ‘sleep 1’ to slow things down a bit. ActiveMQ has many tweakable settings to control backoff, redelivery attempts and dead-letter queue behaviour.

I’m done.

If you want to learn more about awk, check out the awk book on my amazon.com bookshelf.

Y’know, come the apocalypse, the cockroaches’s programming language of choice is probably going to be awk.

Netscape; hindsight is foresight

I have been enjoying reading “Architects of the Web” (see it on my Amazon bookshelf), a collection of stories from the early days of Netscape, Yahoo and the like. Perhaps in an attempt to avoid the doom of repetition, I’ve been reading a lot of “software history” recently … Seattle Public Library has got plenty of cool books.

Chapter one follows the founding of Netscape, from the early days of NCSA Mosaic, the fortuitous meeting of Marc Andreessen and Jim Clark and the beginning of the browser wars. I remember this from first time around, but I didn’t really understand all of what was going on.

The book progresses to follow the start of the browser wars, AOL beginning to bundle IE, Netscape launching the communicator suite …

And then the Netscape part of the book ends.

What? The end? But what about the browser wars? Microsoft getting sued by their own government? The AOL buyout? The Time Warner merger? Open sourcing of mozilla? The doldrums of tangled source code? And finally the rise of firefox?

As I flipped back to the opening “acknowledgements” page, I suddenly understand.

It was written in December 1996.

OMG. This book is a history of the web from the world of 1996. They had no idea what was coming next. Napster was nearly three years away. iTunes and the DRM wars would wait another few years beyond that. Skype, blogs, Flickr and web2.0 weren’t even on the radar yet.

But then again, what would happen if I wrote a ‘history of the web’ book today? Twelve years from now, someone might pick it up and say “Wow, these guys had no idea that X, Y and Z were just around the corner”.

I remember during the early days of Napster, I thought “this is basically illegal and will get squished”. But it took me a while to understand that (although Napster itself would ultimately be doomed) a genie had came out from a bottle and wasn’t ever going back in. Napster itself would end up dead, but so would the “old way of thinking”. It maybe took over a decade, but now stores are selling DRM free digital music and making lots of money doing so. People voted with their feet and it’s hard to stop a crowd.

So it occurs to me that in order to have a chance of seeing the new X, Y and Z before they creep over the horizon, you probably want to try letting go some of your ‘immutable assumptions’ about the world, and see what’d change if the assumption didn’t hold any more. Here’s some which pop into my head: ‘you need to have a bank account to put your money into’, ‘computers are not disposable items’, ‘companies need to keep stuff secret from their competitors’. Coincidentally, I’m also reading a book about Einstein’s life (on my bookshelf) and he’s the posterchild for the the “what happens if we ignore this fundamental assumption” school of thought.

So I’m now wondering: which ‘truths’ will have their demise chronicled in the history books of the future?