Pretty Printing Parsing Problems

Ah, I’ve returned to the Real World after a week of snowboarding fun. Culture shock, muchly.

I haven’t written much recently. That was’t due a lack of stuff to write about, but more due to me doing a little bit of this, then a little bit of that, and never really doing anything in large enough chunks to write cohesively about. Still, I want to play catchup so that in a years time I can remember where I spend my efforts.

My rough, sprawling plan is to finally write my own dream development environment. I’ve pondered lots about what this would look like, but what liberated me was deciding to wave a metaphorical middle finger at the rest of the world. This is going to be *my* dream development environment. I’ll make the tradeoffs which are sensible for me, and I don’t care if anyone other than me finds the end result useful. I’m waving goodbye to ascii source code. CVS, diff, grep and all the other text-based tools won’t be relevant any more. I’ve nothing against these fine tools, but the Big Number One tradeoff I’m making is to base the whole system around an abstract representation of source code rather than a concrete ascii representation. This isn’t a straightforward tradeoff to make. There are as many downsides as there are upsides, but after years of playing around with development tools I think I can tolerate the downsides in order to enjoy the upsides.

This development system will be targeted for, and written in ocaml – because it’s the finest language I’ve encountered. A few years ago, I looked through the gcc sources and spend weeks trying to untangle how they worked. In contrast, when I recently tried to use the lexer, parser and typechecker from the ocaml compiler in a stand-alone test application, it took me about 20 minutes to get it working.

So that’s the plan anyway, in the vaguest sense of the word. It opens the door to all sorts of interesting problems.

How will I edit the code then? Well, to get started, I’m going to pretty-print the abstract representation of each module back into (you guessed it) ascii source code, and use good old emacs! That’s the quick and easy way to get going. But the crucial point is that it’s not the *only* way to edit code in this system. I will also be able to apply semantic transformations (like rename) direct to the abstract source tree. I can write my own display routines to show the code using whatever crazy display or edit method I can conceive of. The raw information is all there ready to be used.

Previously, when I thought about writing a development environment I’d spend ages wondering about how to keep maintain the “original sources”(ie. the ascii files) when the user was busy applying refactorings. Well, I fixed that problem by getting rid of the ascii files! I’ll import my legacy sources into the system once and from that day on, I’ll be working with vastly improved tools and I’ll forget all about files. If I want to distribute an ascii-source version to someone else, I’ll pretty-print one.

“It won’t inter-operate with people’s existing toolsets”, you say. “Noone will buy it”, you say. You’re totally right. But I don’t care. I’m writing this for me, because it’s a Much Better Way to write software. Then, uhh, I’m going to take over the world, uhh, or make tea, or something … 😉

(I had an discussion once with Michelle once where I said that if you believe software patents are wrong, but you’re in a situation where the company you are running is going to go bust unless you pursue some patents, then the Right Things To Do is to stick to your principles and have the company go bust. I don’t often have many views which are strongly one way or the other – most of the time I end up arguing both sides – but when I do get round to strongly believing in something, I’m fairly solid in my beliefs. It annoys me on a daily basis that we monkeys keep ourselves busy with inferior, inefficient coding tools when if we used something better, we could spend more time outside enjoying the sun)

So, onto the concrete software problems! I was trying to find elegant solutions to pretty-printing of source code. There’s lots of literature on how to parse source code – that is, going from the crummy ascii-based representations which we humans have traditionally favoured with all it’s implicit conventions, into a clean structured representation. Pretty-printing, the opposite problem, gets less attention .. I guess because fewer people ever need to do it.

There’s two sides to the pretty-printing problem. One is how to lay the source code out on the page – where to put line-breaks and whitespace. The second is how to reconstruct the “optional” presentation elements – like, putting parenthesis in expressions where needed, and even sometimes adding extra ones where they’re not absolutely needed if that will make it easier for me to read.

Fortunately, this is an area where we people have been showcasing the power and elegance of functional programming languages such as ocaml and Haskell. I found papers describing a pretty printing method and an “unparsing” method which not only describe concise elegant solutions, but also prove all sorts of neat and useful properties of the algorithms. Tasty.

With pretty-printing up and running, I can choose what size of “slice” I want to edit – a single function, a whole module, or maybe a method and all its callees. Then I edit this by hand, and run a parser on it, and then integrate it back into the abstract source tree. If I’ve pretty-printed using the standard ocaml syntax, I can just reuse the standard ocaml parser. I could make up my own syntax too, but then I’d have to write a parser as well as a pretty printer. (There’s a bit of overlap with camlp4 obviously).

Anyhow, for me this is a “burn the diskpacks” project. I am cherry-picking the relevant bits from lisp and smalltalk, mixing them together with ocaml and getting rid of historical baggage like ascii files where necessary. As I said earlier, the real breakthrough was deciding to appoint myself as the single target user. And also, being arrogant enough to believe that most development tools and languages today really do suck beyond belief. Heh, having said all that .. watch me loose interest in computers for the next six months! 😉

3 thoughts on “Pretty Printing Parsing Problems”

  1. Sounds interesting – best of luck. Your rationale is the best there is: Consider the reverse situation, everyone else will love what you do bar you. Hardly motivating!

    You may find some existing work on VPLs interesting. there’s a few at

  2. Good luck. As well as wanting you to be a happy bunny, I think that if anybody’s self-indulgent dream developer environment could ever turn out to be useful to others (like, say, me) then it might well be yours.

    Hmmm. What are you intending to do wrt alphabetic identifiers for stuff? How optional/constrained by having a parseable/pretty-printable representation will those be?

    I hope you won’t say a complete goodbye to text. I would suggest that the serialisation of your abstractly described software project should still be done in a plain text format.

    One thing that worries me a bit is: is it not the case that the most important problems in real software engineering do not become fully interesting until you put them into the context of a shared development effort? For example, a personal change control/documentation system is a lot simpler than a concurrent one. Will it be possible to do interesting enough experiments without collaborating directly with other developers? The answer might have quite an influence on the directions in which you develop things.

  3. You might find SubEthaEdit and similar tools inspiring in terms of collaborative environments. With that kind of front-end, you can all but treat the collective as a single user.

Comments are closed.