It’s MetaTurtles, all the way down

(This is a postscript to my previous entry about “compiler compilers”)

A baby version of English, like the one understood by SHRDLU can be used to construct sentences about colored blocks. It is a language and as such, it has syntax rules (which tell you how to distinguish a well-formed sentence from gobblygook) and it has semantic rules (which tell you what a well-formed sentence actually means). This particular baby language is pretty much restricted to talking about blocks, positions and colors.

Java is a baby language too. It lets you construct sentences about computations – such as “do something five times”. It’s a language, and like all languages it has syntax rules and semantic rules.

There are languages such as BNF which allow you to describe syntax rules, in the same way that “baby english” allowed you to talk about shapes. So, a sentence in BNF could describe the syntax of the java language. Like any language, BNF has syntax rules and semantic rules which tell you what a valid BNF sentence looks like, and what it means. Let’s just clarify the previous-but-one sentence – a syntactically valid BNF sentence can be “intepreted” (according to the semantics of BNF) to describe the syntax of the Java language.

Now, for the first time, we have a chain; sentences in one language (BNF) are describing the syntax rules of another (Java). And those syntax rules just tell us which sentences make up valid Java programs. When we have chain, some people start using the word meta – ie. BNF is a meta-language.

(Question 1: What language do you use to describe the syntax of BNF?)

There are also languages such as OpenJava. They let you construct sentences about Java programs – for example, “add a new method foo() to class Bar()”. OpenJava is certainly a language – it has well-defined syntax rules and semantic rules. However, we’ve also got a chain here – sentences in the OpenJava language are referring to sentences in the Java language. It’s another case of a meta-language.

Earlier, we were looking at “operational semantics” and we noted that it was just a language too. It has syntax rules and semantic rules, like any other language. These rules are what you’ve understood when you can “read” operational semantics and know what they “mean”. A sentence in the “operational semantics” language describes the “meaning” of a programing language. Note that the “operational semantics for Java” sentence doesn’t describe the meaning of a particular java program. It describes a /mapping/ from syntactically correct Java programs to programs which run on a “hypothetical machine”.

And, of course, these “hypothetical machine programs” are just sentences in another language – one which has its own syntax and semantics. It’s turtles all the way down!

(Question 2: What language do you use to describe the semantics of “operation semantics”?)

Now let’s leave programming languages for a moment and look at Proper Grown-up English. It’s a pretty crazy language. The syntax rules are mixed up (and gradually changing), and the semantics are subtle, ambiguous and take years to grasp. But one if it’s cooler properties is that sentences in the english language can refer to other sentences in the english language. We can say things like ‘”nearly finished now” has three words’. We can also get really self-referential and have sentences which describe themselves, such as “multisyllabic”. Our dictionaries define the meaning (ie. the semantics) of english words by using english! There’s clearly a bootstrapping problem here – you couldn’t learn English from a dictionary unless you already knew some English. But English is clearly a self-describing language, at least to some extent.

So, let’s return to the two questions we left above. One obvious answer to the questions would be to use “English” to describe the syntax of BNF, and semantics of “operational semantics”. That’s the approach that a CS lecturer would take.

But, you can actually use BNF to describe the syntax of BNF. The expressive power of the language is enough, and the structure of the language is simple enough, that it can describe itself. BNF can be used to describe the syntax of any context-free language. BNF itself is a context-free language. Therefore it can describe it’s own syntax.

Before we try to answer the second question, we should look more closely at what we mean by “semantics”. A few paragraphs ago, we noted that the operational semantics for Java defined a /mapping/ from syntactically valid java programs to programs which run on some hypothetical machine. Haven’t we just moved the goalposts? We’ve not actually established some Platonic “meaning” for each Java program. We’ve just restated in the problem in terms of some “hypothetical machine”.

No, we’ve not cheated. This is an intrinsic property of semantics. You can’t attain a “Platonic ideal” of the semantics of Java. You can only restate the problem in some other form. Operational semantics tells you how to interpret (syntactically correct) Java programs as programs on a hypothetical machine. Denotational semantics tell you how to interpret them as mathematical domains. But it’s as if you are translating a story between English, French and German, hoping to get closer to the true meaning.

At some point, you just have to “get” one of the languages, whether that’s English or some other language, and be content that you understand with the version of the problem restated in that language. You didn’t learn English (or whatever your first language is) by studing grammar books and reading dictionaries. You learned it through some “side channel” – behavioural reenforcement as an infant.

So, let’s finally return to the one outstanding question. Can “operational semantics” be used to define the semantics of “operational semantics”? Hmm, I should leave this as an exercise for the reader. I myself will return to the question once my head has stopped hurting!

Yet another Compiler Compiler

If you are ever foolish enough to wake one day and think “I’m going to write a compiler for the FooBar language”, you will soon find yourself well acquainted with the “FooBar Language Specification” document. In theory, this document tells you exactly what a program can look like, and how all the bits work. When you start writing the compiler, you’ll spend a lot of time checking little details in this document.

Boring, huh? You have the Holy Grail of Software Engineering (a complete and accurate specification!) but yet you’ll still need to read it, chew over it for a while, and then spend a good few months whacking keys like a little code monkey. If you watch someone write a compiler, they’ll read a bit of the specification, make some changes to the compiler, then repeat ad nauseum. What thought processes are going on in their head to convert this specification document into a compiler? Can’t we get the computer to do all the hard work, so we can spend our time doing more fun things?

Let’s look at our raw materials. The definition of a language is a quite a hard thing to specify. It has to be very precise, so that compiler writers have a clear understanding of how the language works. Compiler writers have a hard job, especially dealing with all the corner cases and edge conditions. One of these days, someone using their compiler is going to try to write a program which uses a hellish mixture of overloading, inheritance, template specialization and inline assembly — and they’ll (quite reasonably) expect that the compiler is going to handle it and keep on smiling.

A precise specification is not necessarily a good specification though. Just think of the instructions you get with flat-pack furniture. They do tell you exactly what to do – “Insert bolt (1) through part (2), using nut (3)” – but they don’t convey the spirit of the procedure. It reminds of the discussion in Zen and the Art of Motorcyle Maintenance where the author describes an analytical description of a motorbike (“a motorbike consists of a power assembly and a running assembly. The power assembly consists of .. yada, yada”). Such a description manages to tell you what a motorbike is, without ever conveying what a motorbike is. Hmm, just go read the book – it had a big effect on my life.

Uhh, let’s get back on course. Languages like “C” were created for pragmatic reasons – to get the job done (writing unix). Later, when the language got more popular it became important to create a formal specification, so that you could guarantee that each “C compiler” on the market will behave in the same way. So, while there is a language specification document for C, it is somewhat of a retrofit. Other languages, such as ML, have progressed along with their specification from early days.

If you read through the specification document for Java or C or C++ you’ll quickly notice that it’s written in english. Certainly, it’s a technobabble version of what you or I speak, more reminiscent of the ZATAOMM quote above than anything you’ll hear in the pub. But, it’s english nonetheless. And, as anyone with a longterm partner will know, english is an ambiguous language. This is just asking for trouble. Compiler writers loose sleep over phrases like “should not” and “must not”. Even worse, when you try to use english in a very precise way, you risk making it a complete impediment to understanding.

Besides, our original plan was to get a computer to write a compiler for us. Computers aren’t very good at understanding english, so we’re unlikely to have much success with languages where the specification is written in engrish.

Fortunately, some clever language designers have thought to use something better to describe their language. For example, the creators of Standard ML use a formal notation called “natural semantics” to describe the meaning of ML programs. Cunningly, you have to buy their book to see it (or live near to a Uni library), but this notation enables them to elegantly and concisely describe the whole of the language in a single slim booklet.

This particular notation (“natural semantics”) is really just another language – one suited to describing the behaviour of programs. We rarely try to use english to discuss calculus or algebra (unless there’s no blackboard nearby) because mathematical notation is much more precise and concise. It’s the same situation with discussing programming languages. Sure, you have to invest some time in learning the language, but once you’ve done that you can communicate effectively and precisely about the meaning of programming languages. Once you’ve learned to read natural semantics, you can spent your winter evenings reading your way through “natural semantics for Java”, “natural semantics for C++”, and so on …

Natural semantics isn’t the only such “language” for describing the meaning of programming languages. There’s also operational semantics, denotational semantics, action semantics and probably lots more. So you could have “denotational semantic for Java” and “operational semantics for Java” and they’d both tell you what Java is.

Given that you can describe the behaviour of Java using any of these notations, you might wonder why you’d pick one over the other. The difference is that one notation is particularly good if you’re building a compiler, while another notation might be particularly good if you’re trying to prove properties of a program (like, “does it do what I hope it does?!”).

A “denotational semantics for C” describes the behaviour of C by mapping each part of the C language onto a mathematical object called a domain. Don’t worry about the details – just note that once you’ve got such a mapping set up, you’ve got a huge toolbox of mathematical techniques available to probe your language with. For example, you probably have an intuitive notion that the code “i++; i++;” is pretty much identical to “i += 2;”. Denotational semantics is the ideal tool for putting these intuitive notions on a more formal footing. Unfortunately, having all these mathematical objects and theorems floating around isn’t getting you much closer to having a compiler for the language.

In constrast, an “operation semantics for C” would describe the behaviour of C programs on some sort of hypothetical computer – probably a fairly simple one. For each construct in the C language, it would describe the transition from the initial state of the machine to the resulting state of the machine. This is a pretty reasonable way of defining a language, especially given that these are programming languages and mostly we’re interested in writing compilers for them. It’s important to choose the “hypothetical computer” carefully. You probably don’t want to choose a Real World computer, since that would make it hard to build compilers for other platforms. But you also don’t want to make the hypothetical computer too abstract (like a turing machine, or lambda calculus) because your description wouldn’t convey the spirit of the language very effectively.

If you were writing a compiler for a language, you’d probably find the “operational semantics” quite helpful. Basically, you’d just need to decide how to efficiently implement the operations of the “hypothetical computer” on your target machine, and the rest is easy!

But wait! We don’t want to write the compiler by hand. Let’s code up that knowledge (the mapping from “hypothetical computer” instructions to “target computer” instructions) into a program, and have it slurp in the “operational semantics” definition of the Java language (which is a mapping from “java source code” to “hypothetical computer” instructions). Hey presto, we got a java compiler/interpreter!

What’s more, we can take that same program (a “compiler compiler”)and feed it the “operational semantics for C++”. Hey presto, instant C++ compiler!

This sounds great. I’m describing a world where language designers write a formal (accurate and complete) description of their language, and we can instantly build a compiler for it. Surely it’s too good to be true? Yeah, of course it is. This kind of thing does actually work to a certain extent (shock, horror) but it’s got big difficulties. For a start, a naive translation will result in a hideously inefficient compiler – it’ll generate correct code, but that code will run very slowly. And, regarding correctness, we’ve moved the goalposts – our compiler compiler had now better be bug free, or we’ll have big problems. That’s just the tip of the iceberg. There’s a lot more work required in this field before you can throw away your copy of gcc.

So, that’s the end of this minor epic. To summarize: we can describe the “meaning” of programming languages using a variety of notations – english, operational semantics, denotational semantics etc. Each of these flavours of notation are suited to a particular task. Some make it almost possible to generate a compiler for the language direct from the specification, eliminating the costly and bug-ridden process of having humans write compilers.

Twees

Wow, people can make climbing trees into a commercial opportunity.

Here’s a classic story software engineering story – Anthony does the intro and DaveM delivers the punchline. Next time you’re chasing down a bug which “can’t possibly be happening” bear this one in mind. Of course, every other time it’ll be a bug in the code. But this once, the code was fine. Crazy stuff!

JQuery is a pretty nice bit of software. It allows you to query your java codebase using prolog-like questions, and it integrates into Eclipse to provide dynamic views across your code base.

I remain undecided whether a bad language with good tools is preferable to a good language with bad tools. People who get paid to code in Smalltalk/Lisp can sit back and be smug.

Philip Wadler

Philip Wadler has moved to Edinburgh to become Professor of Theoretical Computer Science. He’s a bit of a well-rounded clever bunny, he is. He’s been involved in the design of Haskell, support for generics in Java, a whole load of XML stuff and a static type tool for Erlang (which we were discussing recently at Ergnosis). This makes me wonder about PhD plans all over again.

One of his most admirable traits is that he writes extremely well. You know the quote: “Programs must be written for people to read, and only incidentally for machines to execute”? Well, (within his field) I think Philip Wadler writes for humans to read, and only incidentally for computer scientists. Heh!

I’ve always hated dry academic papers; In fact, I hate any situation where information is deliberately made to seem arcane or difficult by the use of jargon or needless layers of abstraction. Joel understands this. Richard Feynman understood this.

The closest I come to having a “life philosophy” is my continual belief that nothing is really as difficult as people make it out to be. That’s why I just go ahead and do things like melting metal (it ain’t as hard as you’d think either). There is some primitive force which makes people want to dress up their knowledge to make it seem more impressive, and they are reluctant to share that knowledge – as if somehow that will drain away some of their own power. There’s something very Emperor’s New Clothes (Gutenberg text) about it. Maybe “knowledge horder” is a suitably pejorative term.

My parents are both teachers. One consequence of this is that I have no inclination to become a teacher! But, I feel very strongly about education and about equality of opportunity. I guess when I see people “dressing up knowledge in jargon”, I perceive people who are building unnecessary barriers around learning. If it isn’t really hard, don’t make it seem hard.

Meep! Urkle!

Today, I am frustrated. I have a fairly good idea of what my “dream development environment” would look like. I am confident that I would be significantly more productive and less stressed if I could use it today to develop applications. However, I’m not sure I’ll ever get to use it.

History is littered with the carcasses of failed software tool companies. There are many great ideas and years of effort lost. The big problem is that you can’t break cleanly from the past. If you build a brilliant new C++ IDE, you’d better be sure that it still works with CVS or other text-based version control systems. Otherwise, no-one is going to start using it. So, that ties you to keeping source code in ascii text files.

If it’s so great, does it need to sell?

If it’s so great, write it youself, use it yourself and clean up – erlang.

Real world concerns – working with existing code, fitting in with cvs, hiring people