This sentence is false

  • I write programs to solve problems.
  • I spend a lot of time writing these programs.
  • During this time, problems relating to programming pop up.
  • I want to solve these problems too!

If you want to start writing tools which manipulate programs, you require a vocabulary which includes nouns like “type” and “function” and “literal”. Most sane people think of, say, mozilla as a program which you execute, but today I’m thinking of it as a dataset. I’m not going to run mozilla – I’m going to process it!

Lisp fans are used to this program/data duality. The list (+ 1 2) consists of three elements – the “plus” symbol, the “one” symbol and the “two” symbol. It looks like plain old data until you pass it to the eval function, whereupon it reveals its true guise as a program for calculating the sum of one and two. But, it’s still data. We can ask how long the list is. We can append/cons another element onto the end of the list and make a new program. And then we can run that new program!

Lispers are lucky. The main data structure in lisp also happens to be the construct uses to express programs. What are the chances of that, eh? It’s very handy.

C++ monkeys have a harder time. When we write source code, we edit an ASCII text document. The compiler processes it and spits out a binary. At no point does our C++ source take on the guise of a C++ dataset. You never see CppFunction or CppDeclaration objects flying around the office.

Well, that’s not totally true. The compiler does build up an internal representation of your source code, but it’s safely locked away from you. It’s called an abstract syntax tree. Even if you could get at the AST, each compiler uses it’s own different representation for the tree and it’s associate type and location information. It’s hard to contemplate writing programs to manipulate C++ programs, when there’s no standard way to represent C++ source code as C++ data.

Furthermore, the task of transforming ASCII C++ source code into a structure dataset is pretty formidable. It’s certainly do-able, since all C++ compilers do it. It’s just really quite difficult to do properly, and it’d probably take you years rather than months to write a black box to do it.

Lisp doesn’t have this problem. You go from “lisp source” to “lisp data” in one trivial step. Smalltalk doesn’t have this problem either. It’s a bit closer to C++ than lisp in that you write the smalltalk source code as ascii text, and then it gets parsed into objects. So, the source code for your “foo” method gets transformed into a MethodNode object, and from there into compiler bytecode. But if you want to turn your smalltalk source code into smalltalk data, there’s a black box which you can pick up and use.

But there’s no such black box to turn C++ source code into C++ data. The best I’ve seen is OpenC++, which is pretty damn good, but it’s a shame that it isn’t integrated with a compiler. Hey, that’d require that everyone agreed on a standard representation of C++ source as C++ data. The gcc compiler uses it’s own abstract syntax tree representation, which is different from the one used by OpenC++, so you can’t just plug them together.

This annoys me. I want to write tools which manipulate C++ source code. There is a hard, but solved, problem of turning C++ source code into C++ data. But there’s very few people out there who have produced a black boxes to do it. EDG sell a compiler front-end which does it, and OpenC++ exposes visitor-pattern access to it. Everyone else’s effort is tangled up in some-or-other compiler.

I’ve wrote this entry from a C++ programmer’s point-of-view. Actually, I’d be quite happy to receive a black box which turned my C++ source into a dataset which ocaml or smalltalk could play with. The best of all worlds would be a language-independent presentation. I’ve just reminded myself of the ASDL project which does exactly that.

A program language which can be used to describe itself is a very powerful thing. You can build all sorts of useful tools with it. These tools can do all the hard work, and you get to go home early.

Must now have a deeper look at OpenC++, ASDL, TreeCC and PUMA.

Update: I came across this online book which describes exactly what I’m talking about.

1 thought on “This sentence is false”

  1. Andrew –
    Take a look at the C/C++ parser in SWIG (Simple Wrapper and Interface Generator, at swig.org). You can write a “language module” add in that gets the entire tree for you to play with. Existing language modules write wrapper code to interface the C/C++ with a scripting or other language with a well-defined API (e.g., TCL, Perl, Python, C#, Java via JNI).

    David

Comments are closed.