Optimizing programs

Simple program gets complex when you make it faster.
Want to encapsulate “optimizations” for clarity, explanation and reuse.
In the compiler? Don’t know if it gets used. Can’t rely on it. Hidden. Someone will “optimize the source” later.
By hand, in the source? Original version lost. Confusing now.
Seperately, in macros? Can’t flip between the transformed versions easily. Hard to debug?

Eg. Drawing a gradient. Incremental computations.

Eg. writing xml and a stylesheet, rather than flat html

Debug Win32 Heap Internals

Having recently started a new job, I’ve been deliberately taking some time to build up my development environment to be just right. Over the past six or so years I’ve figured out what’s the easiest way to do X, or the fastest way to do Y, and I’ve carried these things from job to job. I’ve spent a while tuning my dot-emacs file so that most of my common tasks are automated, and I’ve turned into a “three strikes and you automate” bunny.

Anyhow, I’ve been doing a lot of heap related work, and the MSDN docs aren’t great. So, I finally got round to writing the article which I wish I’d wrote years ago …

It’s obscure! It’s of interest to about 3 people in the world! It’s the … Win32 Debug CRT Heap Internals guide!

(but it means I’ll always know where to find the information, so I’m happy. And it’s always fun indulging in a bit of low-level hackery rather than always using high-level languages which protect you from this stuff)

Next week, I’ll write another article on how to pinpoint heap corruption (writes through dangling pointers etc) using only DevStudio (no Purify or Boundchecker required).

I can’t over-emphasis how good nxml-mode is. I’m using it for all my xml/xhtml stuff now. I couldn’t be without it.

On another track completely, I’ve been doing a lot of thinking about physics recently. I’m trying to relearn electromagnetism from the ground up. I remember reading that Richard Feynman once did this exercise – taking your knowledge to bits, examining all the parts, and carefully putting it back together. I am really appalled at how poor my understanding of this subject it. I did physics up to 2nd year at Uni, and always got top grades. Yet, I have ended up with a facade of understanding. This page convinced me that I wasn’t crazy after all. I think it should be possible to write a simple electromagnetism “simulator” which operates at the level of individual electrons, and only knows a small number of rules (Maxwells laws), but yet is able to recreate pretty much all electromagnetic phenomena. Sure, it’ll be slow but I hope it will be able to demonstrate basically all electromagnetic phenomena.

Lazy

From Ocaml koans:

Michel Mauny was giving a guest lecture to new computer science students. After the students were seated, Michel pronounced “Today’s lecture will be about Lazy Evaluation”, he paused for a moment then concluded with, “Are there any questions?”

Luca Cardelli

I was reading one of Luca Cardelli’s more recent papers this evening. Luca did a lot of early work on the ML language at Edinburgh, including one of the earliest useful ML compilers. Plus he has a cool dijkstra font on his website too. 🙂

Anyway, back to the paper. It’s called “Modern Concurrency Abstractions for C#” (PDF). I got interested in concurrency stuff whilst at Ergnosis. I realised that while we have lots of language support for data abstraction and control-flow abstraction, we don’t have much support for concurrency abstraction. Coding using threads and mutexes is like coding in assembler. It’s very powerful, but there are a million and one ways to introduce bugs into your code. An assembler can’t tell you that you’ve made a type error, or failed to initialize a local variable. Similarly, a compiler can’t tell you that you’ve coded a potential deadlock or race-condition when you’re using threads.

I wouldn’t ever consider writing a big application in assembler, and similarly I don’t really want to write complex concurrent code using the crude low-level primitives provided by threads. It leads to untraceable bugs, loss of hair and late nights. That’s just so yesterday! So, I’m interested in abstractions for concurrency – things which wrap up the complexity and stop you shooting yourself in the foot. I want to work with bigger building blocks.

The paper describes two extensions to the C# language – asynchronous calls and “chords” which allow you to express how combinations of methods work together. The paper then goes on to demonstrate how these new language features can be used to express common concurrent programming idioms. For example, consider a simple one-element container. If the container is empty, calls to Get() should block until some other thread calls Put(). Likewise, calls to Put() should block until the container is emptied by calling Empty(). Chords are a language extension which allows you to express that these pairs of methods have dependencies on each other. Check out the paper for more details.

There’s a few things worth noting about the paper. Firstly, there’s a good discussion about why you sometimes want to add features to the language, rather than providing them in libraries. When you add language support (say, for concurrency) then the compiler can analyze the usage and warn if you’re doing dangerous stuff. If you have concurrency support in libraries (eg. mutexes) then the compiler can’t do any analysis on /how/ you are using it. An example of the latter approach is JCSP, an implementation of Communicating Sequential Processes for Java.

Secondly, I had a bit of a laugh at section 1.3 where he justifies why C# is the ideal testbed for this kind of stuff. I couldn’t help thinking that “because Microsoft pays the bills” is probably the primary reason. Especially when later in the paper he reveals how the prototype compiler for the extended language was written in ML!

Still, it’s nice to see efforts to add more powerful features into mainstream languages. There’s a huge amount of academic research into concurrency, but relatively little of it filters through into the kind of tools you’re likely to use in industry.

It’s an interesting paper and it’s very accessible compared to other more theoretical material about the join calculus and such like. And as an added bonus, my ex-Voxar collegue dnt gets a citation at the end.

Microsoft can boast an impressive array of researchers (heh, I have a languages bias) within their folds. And they feature in Uni labs too. Gosh, they’re even assimilating people I know!. But they’re all still producing papers, so it’s hardly the death knell of research as we know it. But, is the ever-increasing influence of industry over research healthy? I’ll don’t want to start getting all political and start ranting about this, but I’d be interested to hear from people with strong views either way.

Advogato

  • “If there’s anyone to blame for the current open-source mess (both in the ideological sense and in the practical sense), it’s RMS. He’s turned the whole purpose of hacking upside down. Now sharing code, instead of being a by-product of creating a good program, has become an end in itself” (tk, as quoted by chalst).
  • On advogato, chalst, graydon and raph all regularly write interesting material. Definitely worth spending a few minutes digging through their older stuff.