This is an idea which has been growing on me for a few years. I’ve finally manage to nail some of the concepts down a bit. I’m dreaming of a world where programs are elegant, concise and obviously correct. In particular, I’m asking:

“Why do programs have to get more complicated when you try to speed them up?”

If you want to speed up your program, you usually end up rewriting parts of it to make them go faster. Typically, you use better algorithms and data structures, add caching, or maybe rewrite an inner loop in assembler. The resulting code is probably more complicated, harder to understand, harder to modify and more likely to contain bugs. You’ve made a performance gain, but you’ve suffered a heavy loss in code quality.

It’s depressing to start with a clean elegant piece of code and then desecrate it in the name of performance. Aesthetics aside, optimization also makes code brittle. It’s harder to fix bugs in “optimized” code because it’s less clear what’s going on.

In my ideal world, I would be able to write an elegant, clear version of the code, and then seperately specify a transformation which can be applied to my elegant version to produce a faster version. This would allow me to express my intentions much better. I can say “look, this is what I’m really doing” and, seperately, I can say “here is how I make it run fast enough”.

If I were writing code to draw a straightline on a bitmap, my clean version would calculate the position of each pixel independently using “y = mx + c”. My “program transform” would describe the neat trick which Breshnam used in his famous algorithm, transforming line drawing into an incremental algorithm.

Web designers do this kind of thing all the time. It’s very uncool these days to write HTML pages which contain layout and formatting. The cool kids all now write their content in XML, and then produce a seperate “presentation transform” which is applied to make the version you actually look at. This allows the content and presentation to be controlled independently, rather than mixing the two together into a tangled mess.

It is not so easy to do this with code!

Do we not already have seperate optimizing transforms? Your compiler certainly knows about many potential speed-improving transformations, and it can apply them appropriatly to your code. Thus, your compiler will do its best to generate fast code, but it doesn’t give any guarantees.

Consequently, you may be tempted to apply these same kind of transformations manually to your source code to ensure that they really are getting done. For example, you may manually inline a function in your source code, because the “inline” keyword is technically only a suggestion to the compiler. After all, if inlining is critical to the performance of your application, you want to be absolutely sure it’s happening. You don’t want someone to upgrade the compiler on the build machine and suddenly find that your application crawls at a snails pace because the new compiler has decided not to inline your key function.

Manually inlining isn’t nice. It’s a bit like writing old-school HTML, resplendent with “font” tags and “color” attributes. The resulting code is a tangled mixture which is harder to understand.

Now, if you *knew* for sure that inline is going to be applied, you can make the source code clearer by splitting big lumps of code out into functions. You’re gaining code clarity by using functions, but you aren’t suffering any speed-loss because you *know* the functions are flattened out before code generation.

I sometimes think about this back-to-front. Imagine that you *know* that the compiler will honour an “inline” annotation before generating code (eg. some variation on __force_inline). In that case, it is safe for you to apply the opposite transformation (eg. splitting out into functions) to your source code. This means you don’t suffer a performance cost, but you gain a code-clarity points. If “inline” is an performance-boosting transformation, then the inverse of “inline” is a clarity-boosting transformation. What’s the inverse of “inline”? It’s the popular “extract method” refactoring, with all the well-known benefits it brings.

(As an aside, this makes you wonder what a “function” really is. Usually, a function is like a subroutine. You expect to see a CALL instruction, and a RET instruction somewhere. When you start applying transformations to your code, it becomes clear that functions are more like named scopes).

So, if we have a cast-iron guarantee from the compiler that it will honour “inline” we can simplify our code by applying the anti-transform to our source. We can utilise the clarity of functions without any of the cost.

Getting the required guarantee is the difficult bit today. Your language would really need annotations something like “compiler_must(inline)”. Actually, I guess that could be “optimizer_must(inline)” to underline the fact that “compiler” is really two black boxes – an program optimizer, and a code-generator. Either way, this is a much stronger assertion than merely having a comment in your source code which says “// I checked, and the compiler is definitely inlining this function today”.

Are compiler-based optimizations the whole story? No, of course not. A compiler has a very limited understanding of what your program does. It does not realise, for example, that you may be using matrices. It does not know that matrices have lots of interesting properties which could be exploited to make the code run faster. If you were just doing integer arithmetic, then the compiler could help. Integer arithmetic is part of the language, and consequently the compiler probably knows facts like “a * b = b * a”. But once you move away from core language features, the compiler is limited by a lack of domain-specific knowledge.

Clearly, we need some means of capturing relevant domain-specific information (such as “M * I = M, where I is the identity matrix) and then describing domain-specific optimizations which use these properties. Our optimizer in the compiler could then work at a much higher level. Furthermore, now that we have been freed from the boring work of producing optimal code, we can program at a very high level, leaving the task of producing optimal code to the optimizer. Hmm, we’ve arrived at the notion of DSL (domain specific languages) through a very performance-driven route. Normally I think about DSLs because they make programs more simple and easy to understand. I guess that’s just a reflection of the “optimizations are anti-clarity tranforms” idea.

I’ve wafted over the difficulties of actually writing one of these transforms. What language are you going to express it in? Returning to the HTML/XML example, there are a myriad of general purpose and domain specific languages for transforming XML. For low-level source code transforms, we would need to operate in a world of abstract syntax trees and dataflow analysis. For higher level transforms, we will be working in terms of domain concepts, like matrices.

Regardless of what language you use, you need to be very sure that the transformation is correct. It can contain bugs, just like any other program. There are probably many lines of code within your compiler dedicated to the “inline” transform, yet more for the “loop unrolling” transform. Imagine the effect which a bug in your compilers optimizer would have on your product. If we are going to write our own transformation too, they need to be totally bullet-proof.

Actually, that’s not a totally fair comparison. Compilers have to decided, based on limited information, whether it is safe to apply a particular transformation. If the preconditions are met, then the implementation of the transformation must work correctly in all possible circumstances. That means it has to assume the worst case scenario. For example, if your C++ code uses pointers at all then lots of optimizations aren’t possible – since writing through the pointer could conceivably change any of local/class/global variables in the program (the pointer aliasing problem). The compiler, by itself, doesn’t have any more information about how the pointer is actually used. Most compilers have an command-line option which is equivalent to “you can assume that pointers don’t alias”.

However, a transformation in your own code doesn’t have to be fully general. You have to take on the responsibility for ensuring that it’s valid for the cases which you need it to be. Going back to the HTML/XML example, you often find that people create stylesheets for fixed-position layouts, and they check that it look okay. Then they go back and add a new paragraph to the content. As a result, the paragraph overlaps one of the picture and the layout looks bad. Was the original stylesheet buggy? Not particularly, since it did the job fine before the new paragraph was added. It just wasn’t general enough to remain correct in the face of content changed.

Finally, we need to think about the effect which optimizers have on the rest of the development environment. Debuggers, coverage tools and profilers would all be effected by such a scheme. Stepping through compiler-optimized code in the debugger is often “exciting” as it is, and so adding a whole new layer of domain-specific optimizations will make things much worse.

As is traditional when I have a new idea, someone has already got there before me! It’s fun thinking through these things for a while before you read the literature. I wrote all of the above stuff, then went to the web and discovered Todd Veldhuizen’s web page. He does research on this very topic. I’ll have to forgive him for being so keen on C++ template metaprogramming (which is the subject of a whole seperate rant) because there’s loads of interesting information on his site. I was happy to see that my ramblings above bear some resemblence to the stuff he talks about.

[ If I offered a free pint of beer to anyone who made it through the whole article, would I get any takers? 😉 ]

7 thoughts on “Optimization”

  1. I spend a lot of time thinking about similar stuff, but keep promising myself to leave implementation to others 😉

    You may want to browse some research on graph transformation languages and their language of specification. Here‘s a paper that might entertain you, this one surveys some visual languages for graph transformation specification but also touches on some of the underlying complexities regarding the implementation of graph rewriting systems.

  2. You owe me a beer. 🙂

    I sympathise with your point of view. After writing a reasonably straightforward and very slow algorithm, its maddening to take successive passes over your reasonably easy to understand code and transform it into a horribly butchered and very brittle thing.

    You would definately need something domain specific to help with any serious optimization, and even that probably wouldnt be enough. For example, in the simple case you would perform everything in high precision floating point, or even better as much precision as you need, but often you have to compromise and drop precision in favour of speed, and that requires some interaction with your black box optimizer to tell it what it can get away with.

    The frustrating part is the amount of time that we have to devote to plumbing and wiring and jury rigging the elegant solutions we enjoy spending our time devising. I’ll buy you a pint when you’ve found the solution. 🙂


  3. (That some refactorings are the mirror of optimisations is a nice observation. Thankyou for that.)

    When I read this article, I was reminded of the idea of proving the correctness of programs. Bear with me for an analogy.

    When I was studying, there was this idea that a program might come with a proof that it accorded to some specification. The developer enviroment would have tools that helped you derive a program from the specification in a way that guaranteed that it met that spec. Of course this can be done for toy programs. The hope was that approaches could be found that would scale well enough to help in the real world.

    For the most part, this dream has not been realised. Formalising specifications is hard! Formalising proof/derivation techniques is harder, and it always misses some of the ever expanding set of tricks you want to use. I’m sure progress continues to be made in these areas, but it hasn’t touched my daily developer life since I left academia.

    So back in the real world, we get value from using unit tests instead. It’s one tenable way of kinda formalising specifications, because programming languages are pretty good ways to describe the behaviour of programs. And although running lots of test cases doesn’t guarantee program correctness, it’s a significant confidence boost. You haven’t shown the program will always do the right thing, but you have definitely shown it does the right thing in lots of important cases. (Quasi)Proof by example.

    By now the analogy should be obvious. (If you prefer, it’s not even an analogy, just a specialization of the above to a particular kind of spec/test.) Yes, it would be wonderful to formalise a framework wherein one could apply all the useful transformations to turn nice programs into fast programs and therefore know that these two programs – the one you understand and the one you run – were intensionally equivalent. Nice research project. May pay off big one day.

    In the mean time, here we are in that real world again. So: write the nice program. Derive the fast program yourself, by hand, using all your cleverness. Then offer evidence that they are extensionally equivalent: make sure they give appropriately equal results when run on a battery of test-cases.

    (I know I’m not saying you anything you don’t already know, Andy, since you implemented various clean reference renderers to test for equivalence with the quick-but-dirty tweaked to SSE hell released versions at Voxar. But I thought it might be worth saying it out loud nonetheless.)

  4. You owe me, and and a rapidly-expanding set of engineers nearby, a beer 🙂

    Here’s a complex problem: What if the optimization isn’t just performed at a code level? What if you could also significantly optimize the data structures? If I write the simplest possible algorithm to perform a set of operations on a simple data structure, is there any way of specifying not just ways to transform the code to operate more efficiently, but also transforming the data structure to be more efficient in the context of the algorithm(s)?

    The data structure issue isn’t so relevant in, say, volume rendering, but it’s a potentially huge win if you have something more complex. Debugging the optimized code and data structures would be “interesting”.

    I think I need more knowledge of graph theory…

  5. Yeah, I briefly discussed this kind of idea with someone. There’s the whole other axis concerning whether you “know a priori” what you need to optimize, or whether this is something you can only discover by running typical test cases – eg. profile driving optimization.

    Graph theory: yeah, I always resent when one of my real-world interests drives me to learn about some dry, abstract area like graph theory. But it’s happening more and more. Automated proof systems is the other thing which I keep getting dragged reluctantly towards.

    You can claim your beer next time you’re in Edinburgh. 🙂

  6. In what way does this differ from Lisp’s macro facility? You can specify your algorithm cleanly there, then apply generic code transformations to get the efficiency you want.

Comments are closed.