{"id":364,"date":"2010-03-28T23:51:54","date_gmt":"2010-03-28T22:51:54","guid":{"rendered":"http:\/\/www.nobugs.org\/blog\/?p=364"},"modified":"2010-03-28T23:51:54","modified_gmt":"2010-03-28T22:51:54","slug":"parsing-binaries-applicative-stylee","status":"publish","type":"post","link":"https:\/\/www.nobugs.org\/blog\/archives\/2010\/03\/28\/parsing-binaries-applicative-stylee\/","title":{"rendered":"Parsing binaries, applicative stylee"},"content":{"rendered":"<p>I&#8217;ve been writing a parser for a random file format.  As usual, I spent more time playing around with simple functions than anything else.  But along the way I learned more about Control.Applicative, and in particular how to write use it to write super-concise Data.Binary parsers.<\/p>\n<p>My file format starts with header containing some &#8220;magic&#8221; bytes, then an app version (1 byte) and a data version (3 bytes) separated by NULLs.  The parsed header will have type Header, as follows:<\/p>\n<pre lang=\"haskell\">\r\ndata Header = Header { dataVersion :: Word8, appVersion :: (Word8,Word8,Word8) } deriving Show\r\n\r\nmagic = (0x5::Word8, 0x6::Word8,0x7 :: Word8)\r\n<\/pre>\n<p>I&#8217;m using Data.Binary, and the parser is just this &#8230;<\/p>\n<pre lang=\"haskell\">\r\ngetHeader :: Get Header\r\ngetHeader = do \r\n  fail \"Magic word bad\" `unlessM` ((== magic) &lt;$> get )\r\n  Header &lt;$> (skip 1 *> get &lt;* skip 1) &lt;*> (get &lt;* skip 3)\r\n<\/pre>\n<p>So there&#8217;s a lot of cool stuff going on here!<\/p>\n<ol>\n<li>\nTo parse the &#8220;magic&#8221; bytes, I really wanted perl&#8217;s &#8220;die unless ok&#8221; idiom.  I couldn&#8217;t find anything in the standard library, so I wrote my own &#8220;unlessM&#8221;.  Used with backticks, we end up with code which looks like &#8220;die_action `unlessM` everything_is_fine&#8221;.   The implementation, pointlessly concise is just:<\/p>\n<pre lang=\"haskell\">\r\nunlessM :: Monad m => m () -> m Bool -> m ()\r\nunlessM m b = b >>= (`unless` m)\r\n<\/pre>\n<\/li>\n<li>\nThe &#8216;get&#8217; action on the first line yields a triple of Word8&#8217;s (thank you type inference).  The &#8220;(== magic)&#8221; section compares it with the expected value.  To apply the function to the result of the action, we could use liftM however Control.Applicative provides the nicer &lt;$> &#8211; reminiscent of normal function application using ($) but, here, in a monadic setting.  By the time we&#8217;ve plugged those two together, we get a Bool-yielding action .. exactly what unlessM wants.\n<\/li>\n<li>\nAfter parsing the &#8220;magic bytes&#8221;, we need to skip a byte, grab the app version byte, skip a byte, grab the three data-version bytes, and skip a final 3 bytes .. and then return a Header value.  All of this is done in a single line:<\/p>\n<pre>\r\nHeader &lt;$> (skip 1 *> get &lt;* skip 1) &lt;*> (get &lt;* skip 3)\r\n<\/pre>\n<p>Reading from left to right, this says:<\/p>\n<ul>\n<li>We&#8217;re going to return a value built using the Header constructor, a function which takes 2 arguments.<\/li>\n<li>Header&#8217;s first argument comes from a skip-get-skip action.  More on this next, but note that &lt;$> is just function-applied-to-result-of-computation which we&#8217;ve seen already in line 1.<\/li>\n<li>To grok the skip-get-skip action, first imagine that the *-operators mean simply &#8220;and then..&#8221;, ie. the three actions are sequenced.  In practise, they have another trick up their sleeve.  What happens to the values yielded by each of the three actions?  That&#8217;s where the arrows come in.  They point to the action whose value is &#8220;the chosen one&#8221;.  So, &#8220;a &lt;* b&#8221; is a composite action where &#8220;a&#8221; runs first, followed by &#8220;b&#8221;, with a&#8217;s value being yielded and b&#8217;s is ignored.  This works quite nicely in this example.  Why is this any different to monadic bind? Sure, each sub-parser is independent of it&#8217;s mates &#8211; the &#8216;get&#8217; parser has no interest in what the &#8216;skip&#8217; parser returned &#8211; but >> would do that.  The important difference is that the values yielded by the parser are only used once &#8211; right at the end.  In this example, we don&#8217;t vary the parsing based on anything we read &#8211; although, in general, we might well want to do that.  It&#8217;s horses for courses; if you want to combine actions but only need to hyperspace one value out, &lt;* and *> are your friends.  If you need to combine actions in more complicated ways, >>= is there waiting for you.\n<\/li>\n<li>\nSo far, we&#8217;ve glued together the Header function and (the action which produces) its first argument so far.  So we need to bake \/that\/ together with the action for the second argument.  That&#8217;s what the &lt;*> operator from Control.Applicative gives you.  Again, the intuition is that * means sequencing, but this time both the left and right arguments are used to determine the value yielded; specifically via function application.\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>But now I ask myself: is this readable code, or is it line noise?<\/p>\n<p>Certainly, you have to understand what the four applicative operators do .. and &#8216;applicative&#8217; sounds scary.  However, those operators capture two patterns which occur all the time in parsers.  Firstly, a parser yields some final result, based on the data it saw whilst running its component parsers in order (ie. &lt;$> and &lt;*>).  Secondly, each part of the returned value corresponds to some subset of the component parsers, although we don&#8217;t always use all of the bits we parsed (&lt;* and *>).  A parser for a real world file format is going to do these things over and over, so it makes sense to factor the patterns out and give it a (fairly) evocative syntax.<\/p>\n<p>Secondly, type inference is doing a lot of work here.  Mostly, I just need to call &#8216;get&#8217; and the compiler can infer whether I&#8217;m expecting to parse a byte or three bytes.  It&#8217;s all driven off the top level Header type itself.  I do worry that this introduces a very strong coupling between the data definitions and the behaviour of the parser.  Let&#8217;s say I want to elide the third field of the appVersion.  If I just change appVersion to be a 2-tuple instead of a 3-tuple, then the parser will still build and run &#8211; except it won&#8217;t parse the right file format anymore!  I need to remember myself to go in and add a &#8220;skip 1&#8221; into the parser.  Sure, type inference is a double edged sword and I can remedy this simply by adding explicit type signatures in appropriate places.<\/p>\n<p>Ho hum, now I need to stop procrastinating and actually parse the files &#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been writing a parser for a random file format. As usual, I spent more time playing around with simple functions than anything else. But along the way I learned more about Control.Applicative, and in particular how to write use it to write super-concise Data.Binary parsers. My file format starts with header containing some &#8220;magic&#8221; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-364","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/364","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/comments?post=364"}],"version-history":[{"count":20,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/364\/revisions"}],"predecessor-version":[{"id":384,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/364\/revisions\/384"}],"wp:attachment":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/media?parent=364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/categories?post=364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/tags?post=364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}