Stunt Debugging with C++ – Andrew Birkett's blog

I’ve used the DevStudio debugger for years. Until today, an “access violation” dialog meant “game over”. You could poke around with the debugger to find out why your application crashed, but since your application was already dead, it was basically just a post-mortem. And then today, I realised that you can often ressurect the program …

I’ve been trying to get away from the dreaded “edit-compile-test” cycle in C++, since restarting the application is very time consuming. I’d already mentally reinvented the DevStudio edit-and-continue feature (indirect function calls through a table, and preallocate extra space in each stack frame so you can introduce new locals) before I realised Microsoft had already got there.

Having said that, I’d previously vowed never to use edit-and-continue again. When it works, everything is great. But eventually, it’ll fail in some subtle way. Inevitably, that’ll leave you with a subtely corrupted .exe and I’ve spend hours chasing phantom bugs before finally giving in and doing a “rebuild all”. Incremental development works fine with lisp/ocaml/smalltalk. I don’t understand what’s so hard to make it work all the time under C++. Grr.

But today, I started using it again because incremental development is just so nice, and I’ve been spoiled by using languages. So, I happily edited my code and it happily updated my still-running image. Everything went well, until I suffered a thinko and the application crashed. D’oh!

But I wasn’t prepared to give in that easily. It takes a while to restart the application and get it into the correct state. I’d been enjoying “edit and continue” so much that I was determined to have “crash and continue” too. Hey, if it works with lisp/smalltalk then why not in C++.

And then it struck me that my crashed application wasn’t dead after all. Sure, it’d suffered a null-pointer dereference, but the debugger had caught that and now I was looking at the disassembly for the bad dereference. I realised that it was possible to carefully reposition the instruction pointer to run the function epilogue (where it unwinds the stack and restores saved registers) and then manually set a return value by altering eax. Yay, my application restarted and continued as if nothing had happened. To finish the job, I used edit-and-continue to fix the buggy bit of code before it got executed again.

“Crash and continue” has been achieved. 🙂

It won’t always be possible to resurrect a program in this manner. Sometimes you have to manually unwind more than one stack frame. Sometimes, there’s some invariant which needs to be fixed up (like, releasing critical sections). But it’s often possible to get your application to limp back up to the message loop (or similar point) which gives you a chance to patch up your code and have another go.