When Bulletproof is Not Enough
Programming is like sex: one mistake
and you have to support it for the rest
of your life
-- Michael Sinz --
As it is stated in our "Mission Statement", our aim is to write reliable software. While no software is bug-free, we have a whole set of measures to achieve the best reliability possible. In fact, one of us have already architected (and supervised until and after deployment) a system which has achieved availability levels widely considered impossible; for this project we have borrowed some of his techniques and added a few new ones. Below goes the list of approaches and techniques we have for reliability purposes:
- Permanent module-level testing
All non-trivial modules shall be tested, as early as possible, and testing shall continue on regular basis from this point onwards. While it is a bit less bold assumption then it is made in "test-driven programming", where the coding starts with writing of the unit test, but we believe that testing larger modules rather than functions is a much better way to spend efforts.
- Lots of random testing
We have found that in many cases random testing emulating random loads is the only way to test software more or less reliably.
- Lots of extreme testing
We perform and will continue to perform lots of testing in extreme conditions. It is not restricted to conventional "stress testing", we also perform testing in scenarios which are extreme from the point of view of internal implementation restrictions. Extreme testing is especially efficient when combined with random testing; in many cases 5 minutes of such testing is equivalent to 5 years of work in customer environment.
- Special attention paid to multithreaded testing
As (as it is stated on our "Anti-Software" page), multithreading is a Bad Thing, we are paying special attention to the multithreaded testing. All multithreading code is isolated within very few modules, and these modules undergo special testing using our own set of multithreading testing tools (BTW we are going to patent a few of techniques soon).
- Built-in tools for customer-end testing
The worst scenario, which can possibly happen for developers, is when you have a customer which has your software crashing on regular basis, and you cannot help him because you cannot understand what is going on. To deal with it, we have an extensive framework for both runtime checks (to find the problem at the closest point to it's source) and for logging to simplify post-mortem analysis. Moreover, we can and if necessary will produce special versions for the client, which will provide significantly better validation and logging at a fraction of performance. Our goal in this regard is to be able to identify and fix any problem if it happens on regular basis.
- Built-in recovery tools and techniques
For mission-critical environments, we are using special tools and techniques to reduce impact of crashes and failures, from rather simple intercepting of CPU exceptions to detecting application crash or deadlock with subsequent automated relaunch.
- 'Bulletproof' testing
When we are speaking about software being 'bulletproof', we have something specific in mind. For mission-critical pieces of software, we are going to provide 'bulletproof' version which at some minor performance cost will try to maintain it's functionality even in case when 'bullet is fired' through the body of our program. We consider random change of 4-byte word within our process address space to be quite a good analogy for a bullet. We will perform 'bulletproof' tests when we will fire several bullets through our application to measure what is the percentage of bullets it survives (obviously, in some cases it is not possible to recover, but overall we think it is quite a good way of testing of software behavior in extreme conditions).