Method & Creativity

During performance work, I find myself alternating between thinking “yup, that all seems reasonable” and “oh gosh, what’s going on there?”. It’s like some kind of crazy jigsaw puzzle.

I find performance work quite relaxing and satisfying. I think there are several reasons for this. Firstly, it is very methodical work. It goes like this:

1. Establish which particular aspects of your application are too slow
2. Establish what would consistute “fast enough”.
3. Create easy-to-run test cases for each area that needs speeding up (eg. application startup time).
4. Run a profiler to see where your application is actually doing during that time.
5. Based on that evidence, make changes to the application.
6. Repeat until things are fast enough.

It is very methodical, but enjoyable too. When you are identifying the performance bottlenecks, you have powerful tools (such as Intel’s Vtune) at your disposal and they do most of the hard work for you. It is always very satifying to use good tools.

These tools produce a large amount of data, and you have to put on your “investigation” hat to intepret the raw data. I enjoy this phase, partly because I know that all the relevant information is available to me, and partly because it lets you see your application from a different angle. It’s like exploring a landscape, building up a map of an area that you only vaguely knew before. I am always very familiar with the static structure of the applications I work on (what each bit of the code does, and how they fit together), but it’s only when I am doing performance work that I look at the big-picture dynamic structure of the application.

During this investigation phase, some of the facts which reveal themselves are expected and familiar – for example, you would expect a game would spend a lot of its time drawing graphics. These points act as signposts to me – known markers in the large mass of information. But then when you continue drilling down, you often hit surprises. Beforehand, you probably had suspicions as to where the time was being spend. I have to say that my suspicions have always been consistently wrong – to the point that I no longer bother thinking about them too much .. I just run the profiler. During performance work, I find myself alternating between thinking “yup, that all seems reasonable” and “oh gosh, what’s going on there?”. It’s like some kind of crazy jigsaw puzzle. Sometimes weird stuff shows up; people accidentally leave bits of code in the application which shouldn’t be there. Mostly however, the code is basically fine, but with your performance-tuning goggles on (and the benefit of hindsight) you can see how things can be restructured to make them more efficient.

Once you’ve explored around for a while (a fine opportunity to drink some tea) you end up with probably a few areas which you think could be improved. Now comes the creative side of performance work, because at this stage you have lots of choices. You could make the program do the same amount of work, but faster. You could make the program avoid having to do so much work. You could delay the work until later, waiting for a time where a delay wouldn’t be so important, and possibly never having to do the work at all! There are other angles to approach this. It is often not the absolute performance of your application which is critical – it is the user’s _perception_ of the performance which matter. This is why an application will display a progress bars during a long operation. This is why we have splash screens during application startup. This is why lift engineers put mirrors in lobbys near to the lifts. People hate having to wait, and they perceive and application which makes them wait as being slow.

So, the “make things appear less slow” stage can involve a huge range of techniques, from low-level assembler optimizations, through design changes, right up to HCI tricks. You have a real opportunity to pull out your toolbox and go to work. But at all stages, you can always go back and test how well you’ve solved the problem. It’s a lot easier than, for example, choosing which new features to add to an application.

Right, that’s nearly the end. This turned out not to be so much about profiling itself, but more about my reaction to the task. I’m not sure why I’ve recently turned my attention away from technology towards my reaction to technology. Partly it is triggered by the title of a famous paper by Dijkstra, “Programming considered a human activity”. Yes, I am still strongly interested in technology and I think there are great gains still to be made by improving our technology. But at the same time, it is we human beings who use the technology, and we who are affected by its results. Software projects are carried out by humans who get bored sometimes, excited sometimes, and are most definitely not robots. A software methodology which treated team members as interchangable automata is doomed to failure, because it would crush the spirit of each team member. But on the flip side, you need some structure in order to harness the team’s energy and coordinate their efforts. I think the best gains are to be made by adopting methods which amplify and harmonize the efforts of individuals, rather than focusing on process and expecting individuals to execute that process. I think a useful first step is to be aware nof your own rational, emotional and physical reactions to the work you do, and try to avoid the nasty “distraction field” which seems to operate around computer. Oh look, another superficially interesting article on slashdot! There goes another few minutes of my life!

Stucklist

I’ve always meant to write down a programming “stucklist”. It’s a list which you consult when you’re stuck. Maybe it tells you that you’re solving the wrong problem. Maybe it provides inspiration to fix the problem in a different way. Either way, when you’re under deadline pressure and your brain is dribbling out of your ears, a stucklist might just provide a way out. So, without further ado, here’s my initial stucklist:

Avail yourself of more facts
Can you solve a different problem?
Get out of the office and go for a walk.
How can else you make the problem go away without coding a fix for it?
Could you make the endusers avoid the problem area?
Go home and do something less boring instead.
Look at a more general version of the problem.
Or consider a more specific version of the problem.
Is it actually so bad not to fix the problem?
Fix one or more of the variables to simplify the problem
Grab someone, say “I’m a bit stuck” and throw some ideas around
Get someone else to fix it (evil bonus: they have to fix the fix too)
Can you buy in a solution, or spend money to make the problem go away?
Tell your manager that you’ve hit a problem and see if you can have more time, or redistribute work.
Get everyone else to workaround/avoid the problem area
Code a solution using a different style – recusive/iterative, stateful/pure, push/pull, table-based/computed.
How would you solve the problem if you had infinite memory or a super-fast CPU?
Can you put any supporting framework in place which makes the problem easier to fix?
Draw a diagram – I find it’s easier to walk through examples when you can look at something and point at it.
Assume that there’s a way to progress which doesn’t involve solve this problem – can you find it?

[Update: I’ve since found this page on the c2.com wiki which is similar]

Fixing computers / Dynamic linking

Who says you can’t fix things any more? (more photos here).

Sitting at work waiting for our C++ application to link reminds me that just because your language can do lots of stuff at compile time doesn’t mean you necessarily want it to do so. Usually with a C++, the linker goes through your code figuring out what address each function will live at, and making each function-call jump to that address. You do almost all the linking work upfront. In contrast, other languages are lazier – they don’t bother doing any work until it’s actually needed, which makes for faster edit/compile cycles because you’re not doing the work upfront. Actually, often these languages forced into being lazy because the information isn’t available at compile time. But contrapositive doesn’t apply: if you have the information you don’t have to be eager. A C++ compiler could delay figuring out which function to call until the first time the function was called. You’d also need to keep all the type information around until runtime, but you’re probably already doing that for debugging purposes anyway. You’d be moving one phase of the compiler from compile-time to run-time.

You might think it would all be much easier to do this stuff by starting with a more dynamic language. but then with “dynamic-C++” you still maintain the benefit of having static typing checking for silly mistakes. Which brings me full circle to a paper which was on LtU recently which discusses this very topic (and which I have yet to read!).

Ah, dynamic languages vs. static languages. Here I am again.

X11 and maps

X11R6.8 is set to be released soon, bringing some welcome technology improvements to X11 along with lots of pretty eye-candy.

Completely unrelatedly, for a while now I’ve been thinking about how to make a free-as-in-freedom map of Edinburgh. Not just the kind of map which you look at, but also a semantic map which computers could process. It would allow route-finding applications – not just finding routes for cars, but also for cyclists who want to avoid busy roads and hills. Another application could tell you where the nearest ATM or pub was. There are hundreds of useful applications, all of which need high-quality semantically-rich maps. But, as far as I can tell, in the UK all of the commonly used maps are derived from Ordnance Survey data which, despite being a sort-of public body, charge royalties for the data.

Now, on one hand, that’s quite reasonable because it takes a lot of effort to make good maps. But, on the other hand, I have a strong feeling that somehow this information ought to be in the public domain. Local people and companies could use this data in all sorts of ways. It ought to be a shared community resource.

So, I’m left wondering how I can use technology to map Edinburgh. Maybe a combination of a digital camera, GPS and a bicycle would be a good way of grabbing useful raw datapoints? Finding existing public domain map data would be a really useful start – satellite images and photos from aircraft. And there was a recent conference about open-source mapping tools. I think I need to do some basic reading about map-making, because quite honestly I don’t know where to start.

MD5/SHA collisions cont

Now that the updated paper has been published, here is how to see the collisions for yourself:

1. Create messageA.pl, containing the following:

#!/usr/bin/perl
my $p = 
"d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89" .
"55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b" .
"d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0" .
"e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70";
print pack("H*", $p);

2. Also create messageB.pl containing the following:

#!/usr/bin/perl
my $p = 
"d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89" .
"55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b" .
"d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0" .
"e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70";
print pack("H*", $p);

3. Download md5.pl from here

4. Verify it works by doing “echo -n abc | md5.pl”. You should get 900150983cd24fb0d6963f7d28e17f72

5. Run “messageA.pl | md5.pl” and “messageB.pl | md5.pl” and you should get the same hash value (79054025255fb1a26e4bc422aef54eb4)