{"id":57,"date":"2003-10-31T23:30:29","date_gmt":"2003-10-31T22:30:29","guid":{"rendered":"http:\/\/www.nobugs.org\/blog\/archives\/2003\/10\/31\/yet-another-compiler-compiler\/"},"modified":"2003-10-31T23:30:29","modified_gmt":"2003-10-31T22:30:29","slug":"yet-another-compiler-compiler","status":"publish","type":"post","link":"https:\/\/www.nobugs.org\/blog\/archives\/2003\/10\/31\/yet-another-compiler-compiler\/","title":{"rendered":"Yet another Compiler Compiler"},"content":{"rendered":"<p>If you are ever foolish enough to wake one day and think &#8220;I&#8217;m going to write a compiler for the FooBar language&#8221;, you will soon find yourself well acquainted with the &#8220;FooBar Language Specification&#8221; document.  In theory, this document tells you exactly what a program can look like, and how all the bits work.  When you start writing the compiler, you&#8217;ll spend a lot of time checking little details in this document.<\/p>\n<p>Boring, huh?  You have the Holy Grail of Software Engineering (a complete and accurate specification!) but yet you&#8217;ll still need to read it, chew over it for a while, and then spend a good few months whacking keys like a <a href=\"http:\/\/www.nobugs.org\/mt-static\/images\/sidebar-singe.jpg\">little code monkey<\/a>.  If you watch someone write a compiler, they&#8217;ll read a bit of the specification, make some changes to the compiler, then repeat ad nauseum.  What thought processes are going on in their head to convert this specification document into a compiler?  Can&#8217;t we get the computer to do all the hard work, so we can spend our time doing <a href=\"http:\/\/www.kartstartscotland.co.uk\/\">more fun things<\/a>?<\/p>\n<p>Let&#8217;s look at our raw materials.  The definition of a language is a quite a hard thing to specify.  It has to be very precise, so that compiler writers have a clear understanding of how the language works.  Compiler writers have a hard job, especially dealing with all the corner cases and edge conditions.  One of these days, someone using their compiler is going to try to write a program which uses a hellish mixture of overloading, inheritance, template specialization and inline assembly &#8212; and they&#8217;ll (quite reasonably) expect that the compiler is going to handle it and keep on smiling.<\/p>\n<p>A precise specification is not necessarily a good specification though.  Just think of the instructions you get with flat-pack furniture.  They do tell you exactly what to do &#8211; &#8220;Insert bolt (1) through part (2), using nut (3)&#8221; &#8211; but they don&#8217;t convey the <i>spirit<\/i> of the procedure.  It reminds of the <a href=\"http:\/\/bonigv.tripod.com\/chapters\/chapter6.htm\">discussion<\/a> in Zen and the Art of Motorcyle Maintenance where the author describes an analytical description of a motorbike (&#8220;a motorbike consists of a power assembly and a running assembly.  The power assembly consists of .. yada, yada&#8221;).  Such a description manages to tell you what a motorbike is, without ever conveying what a motorbike <i>is<\/i>.  Hmm, just go read the book &#8211; it had a big effect on my life.<\/p>\n<p>Uhh, let&#8217;s get back on course.  Languages like &#8220;C&#8221; were created <a href=\"http:\/\/cm.bell-labs.com\/cm\/cs\/who\/dmr\/chist.html\">for pragmatic reasons<\/a> &#8211; to get the job done (writing unix).  Later, when the language got more popular it became important to create a formal specification, so that you could guarantee that each &#8220;C compiler&#8221; on the market will behave in the same way.  So, while there is a language specification document for C, it is somewhat of a retrofit.  Other languages, such as ML, have progressed along with their specification from early days.<\/p>\n<p>If you read through the specification document for <a href=\"http:\/\/java.sun.com\/docs\/books\/jls\/second_edition\/html\/jTOC.doc.html\">Java<\/a> or <a href=\"http:\/\/anubis.dkuug.dk\/JTC1\/SC22\/WG14\/www\/standards\">C<\/a> or <a href=\"http:\/\/anubis.dkuug.dk\/jtc1\/sc22\/wg21\/\">C++<\/a> you&#8217;ll quickly notice that it&#8217;s written in english.  Certainly, it&#8217;s a technobabble version of what you or I speak, more reminiscent of the ZATAOMM quote above than anything you&#8217;ll hear in the pub.  But, it&#8217;s english nonetheless.  And, as anyone with a longterm partner will know, english is an ambiguous language.  This is just asking for trouble.  Compiler writers loose sleep over phrases like &#8220;should not&#8221; and &#8220;must not&#8221;.  Even worse, when you try to use english in a very precise way, you risk making it a <a href=\"http:\/\/www.netfunny.com\/rhf\/jokes\/97\/Mar\/pilots.html\">complete impediment to understanding<\/a>.<\/p>\n<p>Besides, our original plan was to get a computer to write a compiler for us.  Computers aren&#8217;t very good at understanding english, so we&#8217;re unlikely to have much success with languages where the specification is written in engrish.<\/p>\n<p>Fortunately, some clever language designers have thought to use something better to describe their language.  For example, the creators of Standard ML use a formal notation called &#8220;natural semantics&#8221; to describe the meaning of ML programs.  Cunningly, you have to buy their book to see it (or live near to a Uni library), but this notation enables them to elegantly and concisely describe the whole of the language in a single slim booklet.<\/p>\n<p>This particular notation (&#8220;natural semantics&#8221;) is really just another language &#8211; one suited to describing the behaviour of programs.  We rarely try to use english to discuss calculus or algebra (unless there&#8217;s no blackboard nearby) because mathematical notation is much more precise and concise.  It&#8217;s the same situation with discussing programming languages.  Sure, you have to invest some time in learning the language, but once you&#8217;ve done that you can communicate effectively and precisely about the meaning of programming languages.  Once you&#8217;ve learned to read natural semantics, you can spent your winter evenings reading your way through &#8220;natural semantics for Java&#8221;, &#8220;natural semantics for C++&#8221;, and so on &#8230;<\/p>\n<p>Natural semantics isn&#8217;t the only such &#8220;language&#8221; for describing the meaning of programming languages.  There&#8217;s also operational semantics, denotational semantics, action semantics and probably lots more.  So you could have &#8220;denotational semantic for Java&#8221; and &#8220;operational semantics for Java&#8221; and they&#8217;d both tell you what Java <i>is<\/i>.<\/p>\n<p>Given that you can describe the behaviour of Java using any of these notations, you might wonder why you&#8217;d pick one over the other.  The difference is that one notation is particularly good if you&#8217;re building a compiler, while another notation might be particularly good if you&#8217;re trying to prove properties of a program (like, &#8220;does it do what I hope it does?!&#8221;).<\/p>\n<p>A &#8220;denotational semantics for C&#8221; describes the behaviour of C by mapping each part of the C language onto a mathematical object called a domain.  Don&#8217;t worry about the details &#8211; just note that once you&#8217;ve got such a mapping set up, you&#8217;ve got a huge toolbox of mathematical techniques available to probe your language with.  For example, you probably have an intuitive notion that the code &#8220;i++; i++;&#8221; is pretty much identical to &#8220;i += 2;&#8221;.  Denotational semantics is the ideal tool for putting these intuitive notions on a more formal footing.  Unfortunately, having all these mathematical objects and theorems floating around isn&#8217;t getting you much closer to having a compiler for the language.<\/p>\n<p>In constrast, an &#8220;operation semantics for C&#8221; would describe the behaviour of C programs on some sort of hypothetical computer &#8211; probably a fairly simple one.  For each construct in the C language, it would describe the transition from the initial state of the machine to the resulting state of the machine.  This is a pretty reasonable way of defining a language, especially given that these are programming languages and mostly we&#8217;re interested in writing compilers for them.  It&#8217;s important to choose the &#8220;hypothetical computer&#8221; carefully.  You probably don&#8217;t want to choose a Real World computer, since that would make it hard to build compilers for other platforms.  But you also don&#8217;t want to make the hypothetical computer too abstract (like a turing machine, or lambda calculus) because your description wouldn&#8217;t convey the spirit of the language very effectively.<\/p>\n<p>If you were writing a compiler for a language, you&#8217;d probably find the &#8220;operational semantics&#8221; quite helpful.  Basically, you&#8217;d just need to decide how to efficiently implement the operations of the &#8220;hypothetical computer&#8221; on your target machine, and the rest is easy!<\/p>\n<p>But wait!  We don&#8217;t want to write the compiler by hand.  Let&#8217;s code up that knowledge (the mapping from &#8220;hypothetical computer&#8221; instructions to &#8220;target computer&#8221; instructions) into a program, and have it slurp in the &#8220;operational semantics&#8221; definition of the Java language (which is a mapping from &#8220;java source code&#8221; to &#8220;hypothetical computer&#8221; instructions).  Hey presto, we got a java compiler\/interpreter!<\/p>\n<p>What&#8217;s more, we can take that same program (a &#8220;compiler compiler&#8221;)and feed it the &#8220;operational semantics for C++&#8221;.  Hey presto, instant C++ compiler!<\/p>\n<p>This sounds great.  I&#8217;m describing a world where language designers write a formal (accurate and complete) description of their language, and we can instantly build a compiler for it.  Surely it&#8217;s too good to be true?  Yeah, of course it is.  This kind of thing <a href=\"http:\/\/www.scms.rgu.ac.uk\/staff\/db\/Actress\/\">does<\/a> <a href=\"http:\/\/www-sop.inria.fr\/croap\/centaur\/papers.html\">actually<\/a> work to a certain extent (shock, horror) but it&#8217;s got big difficulties.  For a start, a naive translation will result in a hideously inefficient compiler &#8211; it&#8217;ll generate correct code, but that code will run very slowly.  And, regarding correctness, we&#8217;ve moved the goalposts &#8211; our compiler compiler had now better be bug free, or we&#8217;ll have big problems.  That&#8217;s just the tip of the iceberg.  There&#8217;s a lot more work required in this field before you can throw away your copy of gcc.<\/p>\n<p>So, that&#8217;s the end of this minor epic.  To summarize: we can describe the &#8220;meaning&#8221; of programming languages using a variety of notations &#8211; english, operational semantics, denotational semantics etc.  Each of these flavours of notation are suited to a particular task.  Some make it almost possible to generate a compiler for the language direct from the specification, eliminating the costly and bug-ridden process of having humans write compilers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you are ever foolish enough to wake one day and think &#8220;I&#8217;m going to write a compiler for the FooBar language&#8221;, you will soon find yourself well acquainted with the &#8220;FooBar Language Specification&#8221; document. In theory, this document tells you exactly what a program can look like, and how all the bits work. When [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-57","post","type-post","status-publish","format-standard","hentry","category-programming"],"_links":{"self":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/57","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/comments?post=57"}],"version-history":[{"count":0,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/57\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/media?parent=57"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/categories?post=57"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/tags?post=57"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}