Command-line buses

July 15th, 2010

Xmlstarlet, aka sed for xml, makes it easy to find out when the bus is going to arrive at a stop in Edinburgh without having to leave the command line:

curl -s 'http://www.mybustracker.co.uk/display.php?clientType=b&busStopCode=36253282' |
    xmlstarlet  sel -N x=http://www.w3.org/1999/xhtml -t -m //x:pre -c 'text()' -n

The output looks like this (bus number, destination, minutes until arrival):

10    WESTRN HARBOUR   14
10    WESTRN HARBOUR   43
11    ST ANDREW SQ     10
11    OCEAN TERMINAL   22
12    SEAFIELD         4
16    SILVERKNOWES     3

You can get the bus stop codes from the bus tracker website. Of course, you could just get the times from there too, but where’s the fun in that?

STM with time limit

April 14th, 2010

This is basically a rehash of a useful Simon Marlow post to haskell-cafe, but with a few tweaks. It’s a useful way of wrapping an STM action with a timeout. In my case, I want to wait until some mutable state satisfies a predicate, and I want a timeout to fire if it takes too long. I hate thinking in microseconds, so there’s a helper type to improve that. And where Simon used a nested Maybe, I created some explicitly named constructors.

import Control.Concurrent.STM
import Control.Concurrent
import Control.Monad

data TimeLimited a = Timeout | Result a deriving Show

data Timeout = TimeoutSecs Int
             | TimeoutMs Int

waitUntil :: TVar a -> (a -> Bool) -> Timeout -> IO (TimeLimited a)
waitUntil var pred timeout = do
  timer < - registerDelay $ case timeout of
                             TimeoutSecs n -> 1000000 * n
                             TimeoutMs n -> 1000 * n
  let
      check_timeout = do
        b < - readTVar timer
        if b then return Timeout else retry
      check_t = do
          m <- readTVar var
          when (not $ pred m) retry
          return $ Result m
  atomically $ check_timeout `orElse` check_t 

main = do
  tvar <- atomically $ newTVar 44
--  tvar <- atomically $ newTVar 41
  waitUntil tvar (>43) (TimeoutSecs 1) >>= print

Quickrun for incremental haskell/emacs development

April 6th, 2010

I often add quick test functions (like “testme”) inside the haskell module I’m developing. They’re short-lived and soon turn into tests. But I find this approach works better than putting the same lines into ghci because they ‘persist’ for longer.

Since I do this so often, and I use emacs, I automated it. With the following code, I now just hit C-c C-c and emacs loads my module into ghci and runs my test function. Stick a C-u in front, and it’ll let you choose a different function as the default to run.

(add-hook 'haskell-mode-hook 'my-haskell-hook)

(defun my-haskell-hook ()
  (local-set-key "\C-c\C-c" 'haskell-quickrun)
)

(defvar haskell-quickrun-command "main")

(defun haskell-quickrun (arg)
  (interactive "P")

  (unless (null arg)
	(setq haskell-quickrun-command (read-from-minibuffer "Function to run: ")))

  (inferior-haskell-load-file)
  (inferior-haskell-send-command (inferior-haskell-process) haskell-quickrun-command)
  (end-of-buffer-other-window 0)
)

Applicative arrows for XML &&& return to pure

April 2nd, 2010

Last time I had lots of success using Control.Applicative with Data.Binary parsers. Now I’m onto parsing XML. I’ve done this a few times in Haskell using HXT and its funky Arrows. However, this time I was keen to see if I could factor out Yet More Code in an applicative stylee … and indeed I can.

Let’s start with an example of a round-trip XML parse-then-print. We’ll need all the imports and the arrow language extension eventually, so let’s get them out of the way now.

{-# LANGUAGE Arrows #-}

import Control.Applicative
import Text.XML.HXT.Arrow
import Text.XML.HXT.XPath.Arrows
import Text.Printf

xml = "<menu>\
      \  <course name='starter'>Soup</course>\
      \  <course name='mains'>Pie</course>\
      \  <course name='dessert'>Cake</course>\
      \</menu>"

testRoundtrip = runX $ readString [(a_remove_whitespace,v_1)] xml >>> writeDocumentToString []

Now lets parse the xml properly. Firstly, we’ll have a datatype to represent the menu itself:

data Menu = Menu { starter :: String, mains :: String, dessert :: String } deriving (Show)

The parsing code itself starts off by converting the String into a Tree, skipping down over the root node, and then trying to parse out a menu:

testParse = runX $ readString [(a_remove_whitespace,v_1)] xml >>> getChildren >>> parseMenu

Since we want to do multiple things with the “menu” node, we’ll use the special arrow syntax to name things:

parseMenu :: ArrowXml a => a XmlTree Menu
parseMenu = proc menu -> do
              starter < - parseCourse "starter" -< menu
              mains   <- parseCourse "mains" -< menu
              dessert <- parseCourse "dessert" -< menu
              returnA -< Menu starter mains dessert

parseCourse name = getChildren >>> hasName "course" >>> hasAttrValue "name" (==name) >>> getChildren >>> getText

BTW, whilst the implementation of parseCourse is very combinatorically worthy, in practise I think I’d probably use the following version instead:

parseCourse name = getXPathTrees (printf "//course[@name='%s']/text()" name) >>> getText

Anyhow, looking back at the implementation of parseMenu, something bugs me. I had to give names to “starter”, “mains” and “dessert” despite only ever intending to use them to construct the final return value.

Sounds like a job for Control.Applicative!

However, sadly typing “:i Arrow” into ghci told me that my Arrow types weren’t Functors and therefore I couldn’t immediately perform the < $> and < *> trick to make a one-liner.

Boo!

But, a quick glance at Control.Applicative showed that the WrappedArrow type was a Functor. So, all I need to do is use the WrapArrow constructor to package up my arrows and, hey presto, applicative city here I come!

parseCourse name = WrapArrow $ getXPathTrees (printf "//course[@name='%s']/text()" name) >>> getText
parseMenu = unwrapArrow $ Menu < $>
             (parseCourse "starter") < *>
             (parseCourse "mains")   < *>
             (parseCourse "dessert")

Okay, so that’s all nice and concise. But it occurs to me that Control.Applicative is still all about *sequencing* computations. But why does parsing XML require sequencing? Surely it’s pure?

Indeed, this is where some of the flexibility of HXT kicks in. HXT supplies a variety of arrow types to fit your needs. If you want to be able to do some disk IO mid-parse, there’s an ArrowIO for you. If you want to maintain some user state during the parse, ArrowState is your friend. But you have the freedom to choose which one you want to use. If you are NOT performing IO and don’t need state, an ArrowList will do you nicely.

Furthermore, most the HXT combinators are uber-flexible. Their types are minimally constrained, allowing you to choose which arrow you want to work in (ie. an IO arrow, a State arrow or a pure List arrow). For example, the “getChildren” function works with any of those three. The HXT combinators which actually do IO (such as readDocument) understandably have a more specific type.

In the running Menu example, parseCourse is pretty flexible. It works with any arrow in the ArrowXml typeclass, which in turns only requires ArrowList behaviour (and a few others which aren’t relevant here).

Therefore, we can use parseCourse as a pure ArrowList, and do some ArrowList-plumbing to supply the three arguments to the Menu constructor. The end result will be a pure function between XmlTree’s and Menu’s. Furthermore, since we’re not using any effectful arrow combinators, we don’t need to nail down the sequencing.

The arr3 combinator converts a 3-arg function into an arrow whose input is a 3-tuple. Well, actually it’s a tuple that looks like (a,(b,c)). But that’s happily what the ‘plumbing’ combinators which we’ll use to plug in the three input arrows produce. So, without further ado:

parseMenuPure :: LA XmlTree Menu
parseMenuPure =  arr3 Menu <<< (parseCourse "starter") &&& (parseCourse "mains") &&& (parseCourse "dessert")

testParsePure = do
  [xml] < - runX $ readString [(a_remove_whitespace,v_1)] xml
  return $ runLA (getChildren >>> parseMenuPure) xml

So, I’ve come full circle. Arrow syntax looks like monad syntax, and therefore suggests an ordering (I always make this mistake). Control.Applicative provides a concise way to express some common ordering patterns. But not all arrow types have effects which need ordering. If your arrows are pure, you can plumb them directly.

Update: I later realised where my confusion came from. I still had an early version of “testParse” in my file which did the xml reading and the call to parseMenu in one pipeline. This caused type inference to decide that the type of parseMenu was IOSLA (IO + State + List) just like readString is. I somehow assumed this was due to the explicit ordering demanded by the applicative operators – wrong! Once I noticed and removed my original “testParse”, the type of parseMenu reverted back to its natural type of “ArrowXml a => a XmlTree Entry). And that’s the case whether I use the arr3 version, the applicative version or the arrow syntax version. Hmm, so this tells me that I need to play around with Control.Applicative for some non-Monad types … I have an overly specific notion of “sequencing” which is tricking me.

Parsing binaries, applicative stylee

March 28th, 2010

I’ve been writing a parser for a random file format. As usual, I spent more time playing around with simple functions than anything else. But along the way I learned more about Control.Applicative, and in particular how to write use it to write super-concise Data.Binary parsers.

My file format starts with header containing some “magic” bytes, then an app version (1 byte) and a data version (3 bytes) separated by NULLs. The parsed header will have type Header, as follows:

data Header = Header { dataVersion :: Word8, appVersion :: (Word8,Word8,Word8) } deriving Show

magic = (0x5::Word8, 0x6::Word8,0x7 :: Word8)

I’m using Data.Binary, and the parser is just this …

getHeader :: Get Header
getHeader = do
  fail "Magic word bad" `unlessM` ((== magic) <$> get )
  Header <$> (skip 1 *> get <* skip 1) <*> (get <* skip 3)

So there’s a lot of cool stuff going on here!

  1. To parse the “magic” bytes, I really wanted perl’s “die unless ok” idiom. I couldn’t find anything in the standard library, so I wrote my own “unlessM”. Used with backticks, we end up with code which looks like “die_action `unlessM` everything_is_fine”. The implementation, pointlessly concise is just:

    unlessM :: Monad m => m () -> m Bool -> m ()
    unlessM m b = b >>= (`unless` m)
    
  2. The ‘get’ action on the first line yields a triple of Word8’s (thank you type inference). The “(== magic)” section compares it with the expected value. To apply the function to the result of the action, we could use liftM however Control.Applicative provides the nicer <$> – reminiscent of normal function application using ($) but, here, in a monadic setting. By the time we’ve plugged those two together, we get a Bool-yielding action .. exactly what unlessM wants.
  3. After parsing the “magic bytes”, we need to skip a byte, grab the app version byte, skip a byte, grab the three data-version bytes, and skip a final 3 bytes .. and then return a Header value. All of this is done in a single line:

    Header <$> (skip 1 *> get <* skip 1) <*> (get <* skip 3)
    

    Reading from left to right, this says:

    • We’re going to return a value built using the Header constructor, a function which takes 2 arguments.
    • Header’s first argument comes from a skip-get-skip action. More on this next, but note that <$> is just function-applied-to-result-of-computation which we’ve seen already in line 1.
    • To grok the skip-get-skip action, first imagine that the *-operators mean simply “and then..”, ie. the three actions are sequenced. In practise, they have another trick up their sleeve. What happens to the values yielded by each of the three actions? That’s where the arrows come in. They point to the action whose value is “the chosen one”. So, “a <* b” is a composite action where “a” runs first, followed by “b”, with a’s value being yielded and b’s is ignored. This works quite nicely in this example. Why is this any different to monadic bind? Sure, each sub-parser is independent of it’s mates – the ‘get’ parser has no interest in what the ’skip’ parser returned – but >> would do that. The important difference is that the values yielded by the parser are only used once – right at the end. In this example, we don’t vary the parsing based on anything we read – although, in general, we might well want to do that. It’s horses for courses; if you want to combine actions but only need to hyperspace one value out, <* and *> are your friends. If you need to combine actions in more complicated ways, >>= is there waiting for you.
    • So far, we’ve glued together the Header function and (the action which produces) its first argument so far. So we need to bake /that/ together with the action for the second argument. That’s what the <*> operator from Control.Applicative gives you. Again, the intuition is that * means sequencing, but this time both the left and right arguments are used to determine the value yielded; specifically via function application.

But now I ask myself: is this readable code, or is it line noise?

Certainly, you have to understand what the four applicative operators do .. and ‘applicative’ sounds scary. However, those operators capture two patterns which occur all the time in parsers. Firstly, a parser yields some final result, based on the data it saw whilst running its component parsers in order (ie. <$> and <*>). Secondly, each part of the returned value corresponds to some subset of the component parsers, although we don’t always use all of the bits we parsed (<* and *>). A parser for a real world file format is going to do these things over and over, so it makes sense to factor the patterns out and give it a (fairly) evocative syntax.

Secondly, type inference is doing a lot of work here. Mostly, I just need to call ‘get’ and the compiler can infer whether I’m expecting to parse a byte or three bytes. It’s all driven off the top level Header type itself. I do worry that this introduces a very strong coupling between the data definitions and the behaviour of the parser. Let’s say I want to elide the third field of the appVersion. If I just change appVersion to be a 2-tuple instead of a 3-tuple, then the parser will still build and run – except it won’t parse the right file format anymore! I need to remember myself to go in and add a “skip 1″ into the parser. Sure, type inference is a double edged sword and I can remedy this simply by adding explicit type signatures in appropriate places.

Ho hum, now I need to stop procrastinating and actually parse the files …