Strong typing vs Strong testing refute

Bruce Eckel writes about strong typing vs strong testing – a topic close to my heart. But he makes the common mistake of reducing “runtime typing vs. static typing” to a language feature comparison. Java is a very poor example of static typing!

But I think I disagree with his larger point – that you’re better off with test systems rather than type system. I think you should have both! At work, I’ve watched large programs being built up and I think you reap great benefits from static typing. For a start, types act as documentation. The type of a function argument indicates how it’ll be used. Abstract interfaces act as barriers to complexity within a large system. Type annotations such as ‘const’ in C++ allow you to express some of the semantics of the system and the compiler will check all of your code for violations.

Can tests ever replace this? I think not. Firstly, in my experience, checking enough of the cases is hard. And the cases which don’t get tested are probably the rarely executed ones where bugs lurk. What happens if someone trips over the network cable as your program is sending data? Does that error handler work? How about if the harddisk is full and your program can’t write a temporary file?

Sure, static type systems can’t check very many properties of your program – that’s why I still love unit tests – but they can check quite a lot including a lot of common mistakes. Given that it takes a compiler a few milliseconds to type-check a function I think static typing is a big win. The compiler won’t suffer from deadline stress and forget to check your new code.

I wonder if many of the great claims made about programs written in python arise from people who don’t write very large systems, and who don’t have to be very strict about dealing with every single failure condition. It’s one thing for bittorrent to fall over when something goes bad. It’s quite another if software in a life-support system bails.

I see things like ‘const’ in C++ and I wonder if there are other annotations which I could add to my source code, so that they act both as documentation of my intent and so that the compiler will check them. This is what lead me to my current fascination with computer languages and their facilities. I talked to Anthony about this a while ago, since he did his PhD on something-to-do-with-type-systems and he murmed in a “it’s tricky” way. Around that time, I looked at dependent types. With this more powerful type system you can express stuff like “foo() is a function which takes two arguments: a vector of integers called ‘data’ and an integer called ‘length’ but furthermore the vector will always have that length”. But, apparently that makes the type system undecidable in general. In plain english, your compiler can say “your program is well-typed” or “there’s a type error” or it could go into a loop trying to decide. That’s not great and I find it a bit worrying that something which can be expressed so simply in english messes up your compiler so bad.

But, hey, even I have doubts. Not because I feel shackled by the type system in ocaml or haskell but because systems like Squeak have much better tools for exploring and tweaking systems. Hmm ….

Of limited scope

Any functional programming bunny will tell you that “reducing the amount of state” in a program is something which fpl’s are good at. When you’ve got first-class functions at your disposal, you don’t need loop counters any more, When you’ve got discriminated unions (variants in ocaml, very different from variants in VB!) you tend not to need so many ‘flag’ variables in your programs. Removing all these variables from your code makes it easier to read.

Back in the C++ world, I’m a big fan of limiting the scope of variables as much as possible. If you introduce a temporary variable which only gets used in the very next line, then surround the whole thing with braces so the variable immediately goes out of scope.

   {
     int tmp;
     sscanf(buffer, "%d", &tmp);
     counter += tmp;
    }

This makes it clear that ‘tmp’ is introduce, used, and disposed of in these three lines. There’s no chance it’ll be used later. Of course, it’s possible this is a sign that you should factor out a smaller function, but not always.

Now let’s move from the function level up to the class level. Some object data members are definitely part of the object’s state. You would expect a BankAccount object to have a ‘balance’ data member. Some object data members are there for efficiency. For example, an object might maintain a single StringBuffer for use by it’s methods, rather than have methods repeatedly construct/destruct one each time they needed it. Finally, some object data members are used as communication between different member functions. They might only be set in method foo() and only read in method bar(), but your only option is to make them visible to all the class methods.

So, to bring this rambling to a point; If you need a bit of data to have scope bigger than the local function, your next step up is object scope. While that’s a whole lot better than making it a global variable, I wonder if there’s useful extra scoping levels which you could add with language support.

For example, “this data member is only visible to methods foo() and bar()”. However, maybe this is just a sign that foo(), bar() and the data member should be a subobject.

Another example: “this data member is only relevant between calls to Start() and Stop()”. Well, a Maybe<> type gets you part way there by communicating “this values isn’t always present” but there’s no way to communicate /when/ it’s going to be valid to someone reading the code, short of a comment. This isn’t something you could check for statically, obviously.

Actually, a better example would be “this data member is only relevant after someone has called Initialize()”. If you could express that in the language, then you’d probably save an awful lot of manual “if we’re uninitialized, print an error” checking.

This is at best a half-baked idea. I know there are reasonable ways of doing this all in code just now. I’m interested to see if there’s better ways of doing this. Ways which communicate intent better than hand-coded checks.

Once upon a type

Static vs. dynamic type systems. It used to be an easy choice. Along with the rest of the human race, I make typos all the time. If I made a typo whilst typing a function call in some rarely-used bit of code, a static type system would tell me. Also, I’m a great believe in making computers do as much work as possible so that I can leave work on time and do something more fun. I like the idea of a compiler whirring away, checking lots of common mistakes for me. Letting go of that seemed dangerous
Continue reading “Once upon a type”

I can see clearly now the bugs have gone

John Wiseman has a cool example of a debugger visualization plugin. Why be content with little expandable tree views of your datastructures when you could have arbitary visualization plugins? It’s hard to be excited about stuff like DevStudio’s autoexp.dat when you see what is possible elsewhere.

It’s harder to do this in C/C++ than lisp/smalltalk because you’re operating at the level of bytes and memory addresses rather than objects. A visualization plugin for a C++ debugger would have to be ultra-careful not to dereference bad pointers or it would bring the whole debugger down. Also, since objects in C++ don’t carry around much information about their type, you’d have to manually ensure that you update the plugin if someone adds a new data member to the type which the plugin shows. Perhaps you can get enough information from the debug info.

Hmm, so once again something which is easy in lisp or smalltalk becomes “possible, if you’re very careful” in C/C++.