QualityCode.com Essay: Do the simple thing, even when it hurts

One of the four values of Extreme Programming (XP) is Simplicity. One of the pithy XP sayings is “Do The Simplest Thing That Can Possibly Work”. Whether you agree that this is an effective strategy or not, it certainly sounds easy. It’s not.

For some reason, my natural tendency is to look ahead, and shy away from doing what is truly simplest.¹ It’s strange, because I place a very high value on simplicity. At every turn, I try to ask myself if I’m really doing the simplest thing. And yet, more often than I care to admit, I find myself favoring a more complex solution. And, more often than not, it costs me in the end.

A recent example comes from a project I’ve been working on part-time for the last six months. It’s a system that allows users to create, store, and search “bulletins”. Each bulletin has a unique id, and consists of some data. The specific contents aren’t important for this discussion.

Early in the project, I was working on the story “Create New Bulletin”, which had to save bulletins to disk. Up until that point, I was merely displaying hard-coded bulletins, allowing the Customer to get a sense of how the user interface would work.

Now, I needed to store these bulletins…somehow. I already knew I didn’t want to use a relational database. Besides being too complex as a starting point, the bulletin data didn’t lend itself to the rigid structure of a database. So I considered some alternatives:

Store each bulletin in a file, using the id as the filename
Store all the bulletins in a single flat file
Store all the bulletins in a single, large XML file
Store all the bulletins in an object database or using some other third-party storage system, like the Berkeley Database

The simplest thing I could have done was the first option: storing each bulletin in its own file. When a bulletin was added, write a new file. Updating and deleting bulletins would be just as easy. Definitely the simplest thing…

But I thought ahead: Operating systems can start to degrade with as few as a hundred files in a single directory, and definitely start to grind when you get a thousand. Plus, this scheme would either require me to create a data subdirectory, or would mix bulletin files in with the application code and configuration files. Finally, having the bulletins spread out in multiple files would make it easy for someone to delete one bulletin through the file system, which could leave dangling references to it in other files.

No, I chose option three, storing all the bulletins in a single XML file. XML was already in our plan as a data exchange format, so it wasn’t much of a stretch to use it as the primary storage format as well. I strongly suspected that this approach would have to be abandoned sometime later in the project (before our first release), but it seemed like a simple bridge that would last a while.

Getting the XML parser to work wasn’t trivial, but I got it going after a couple days². After I had the system reading and writing the big XML file, I chose another non-simple solution: I decided to create an “autosave” method. I won’t go into details here, but it cost me several hours on a few occasions, because of subtle defects.

Eventually, as expected, the time came when the large XML file became too slow, and had to be replaced. By this time, it was clear that the correct solution was to use a system that would allow us to treat the data like a disk-based Map or Hash. We needed to be able to write, read, or discard bulletin data based on a unique key. The data itself could be treated as a large string, or as a block of raw data bytes. This is similar to the design of the Berkeley Database, and results in a very simple, fast system.

Unfortunately, it took quite a bit of work to overhaul my “one big XML file” code to instead store individual bulletins into the database. It would have been far easier if I had been storing each bulletin into a separate disk file.

So “thinking ahead” (writing one big XML file instead of one bulletin per file) not only cost me more when I initially wrote the code, but also cost me more during debugging when it had some subtle problems, and again even later when I had to migrate to a totally different solution. I would have been far better off just doing the simplest thing that could possibly have worked. If you pay close attention to your own work, you’ll probably be surprised how often that turns out to be true.

¹ This was written in 2002. After a few more years of XP/Agile experience, I became quite good at not looking ahead.

² XML was relatively new at that point, so the tools were rough and there was a learning curve. Today, this would take 5 minutes.