Friday, September 21, 2007

 

Side Effect Oriented Development

Product release nears, must be time for another blog post.

Every developer knows (or at least is told) about how side effects are bad. They make program behaviour unpredictable. Simple code modifications can produce new and wonderful bugs. The code base I'm working on has an ancient heritage, and has been worked on a by a number of developers of varying experience and skill. A number of relatively poor design decisions were made very early on, and current developers are suffering.

So I would like to coin the term "side effect oriented development". I know this is a new term, since I Googled it, and Google is all-knowing. 0 hits. It's mine!

What exactly is it?

Side effect oriented development is the tools and mindset necessary to work on code that depends on so many side effects. It is an attitude to fix, remove, and re-design side effect driven code. It is the ability to recognize the dangers of changing that one inconspicuous line that will bring the app to it's knees.

It is also one of the most frustrating environments for a developer to work in.

Imagine code that actually depends on memory leaks. Fix a memory leak, cause a crash.
Imagine being afraid of making even the simplest of fixes, because you don't know what will happen to some other unrelated piece of code.
Imagine no unit tests.

It takes a fair amount of discipline and creativity to deal with code like that.

First, isolate your change to a small area of code at a time, such as a single function. Avoid wholesale changes, as they are more likely to introduce hard to track bugs. Small changes will limit the damage.

Second, analyze the potential impact your change may have. This usually requires good knowledge of the code, and can be difficult to understand. One of the best tools for determining how a particular code change will impact the whole project is an application called Understand for C++. If you are dealing with difficult code, this tool is a must. It allows you to browse a function's calltree, both invocation and call-by. It even has a scripting capability so that you can extend it's functionality. A real life saver, and well worth the investment.

Third, you need to fix the known side effects. Ideally, this means that you modify the code that depended on the old side effect which your brand new code hopefully removed. This can sometimes introduce a cascade effect, requiring more and more code changes throughout the calltree. At some point, you need to introduce a layer where you preserve existing side effect behaviour so that you can limit the damage. You can then go back later and remove your intentional side effect once you have validated that the new code works.

Fourth, verification. This can be difficult if you don't have unit tests, especially if your code is large, monolithic and poorly organized. At worst you should have regression tests that can be run. While not as good as well written unit tests, they can at least give you some comfort that your change hasn't completely borked a section of code you thought was unrelated. Once you are satisfied that your change is correct, go back to three and clean up your intentional side effects.

These are fairly general recommendations. I don't want to go into too much depth, but needless to say, it can quite a chore to fix code like this. I look forward to the day when all of the side effect dependent code is removed.

Saturday, October 14, 2006

 

The Law of Leaky Abstractions

The Law of Leaky Abstractions - Joel on Software

Ok, my first real post and I'm leaching off of Joel.

This is one of the most insightful articles I've read on software development. Leaky abstractions occur everywhere while developing software. Every software developer has run into these leaky abstractions and most of them don't even know it.

Probably the most common form of abstraction that leaks is integer arithmetic. Sure, everyone one knows that 1 + 2 = 3. This is true for any language, on any platform. What about 100 + 101? Is that equal to 201? Of course.. unless you are using a signed byte as your data type, in which case it happens to equal -55.

Even something as simple as integer math breaks down eventually. Fortunately the vast majority of developers are aware of integer wrap around, but few of them think of it in a more formalized way. By formalizing this concept, it allows us to look for other abstractions that may be leaking without our explicit knowledge.

There is an import point implied by the article. In order to identify the leak, you need to understand exactly how the abstraction works. This can be problematic. Sure most developers understand the basics behind integer math, but what about much more complex abstractions, like compilers?

This is why a good software engineer understands not just how to use his tools, but how those tools work. Too often I've interviewed potential developers who seemed to think that it didn't matter if they knew what was happening when they hit "build".

That's why I ask the question "how are virtual functions implemented in C++" during an interview. This is such an important concept, because those that don't know (or worse, don't care) must be regarding the very language that they work with as a magic box. When that magic box fails, as it so often does, if you don't know what is happening, or what should be happening, you are stuck.

If you understand the abstractions in use, you not only know how to fix them when they go wrong (or at least can come up with a workaround), but you gain insight into how to use the abstraction more efficiently. I don't need to memorize a rule about whether or not my class's destructor needs to be virtual. I can deduce the correct answer because I have the knowledge about the implementation. This applies to any abstraction. The more we know about the implementation of the abstraction, the less rote memorization is require to use the abstraction correctly.

This of course brings up the question of how much of the implementation should we know. It is a law of diminishing returns here. Do we need to know about pointer arithmetic, or call stacks, or calling conventions? They are certainly useful for other abstractions, but in terms of virtual functions, they don't add a whole lot of useful information. The real trick is knowing how far down you need to drill before you are just learning for the sake of learning.

Not that there is anything wrong with that...

 

Is this thing on?

Welcome to my first blog post. I've started this blog with the intent to discuss software engineering. I hope to put up some articles that I have been thinking about writing.

But don't hold your breath. It could be a while before I have a chance to write more than a very brief intro.

This page is powered by Blogger. Isn't yours?