David Woods, Professor of Integrated Systems Engineering at Ohio State University, defines brittleness as “a sudden collapse in performance when events challenge system boundaries (separate from how well it performs when operating from its boundaries).” All systems are designed within boundaries whether explicit or implicit. For example, we might define a non-functional requirement of a transactional system as being able to handle 10K requests per second with responses remaining below 100ms. We also might run this system on a VM instance that has an MTBF (mean time between failures) of 26K hours (~3 yrs). The system now has a boundary on availability that is tied to the instance it is running on. We might not have explicitly decided this but it still exists.
Something I learned as a consultant helping hyper-growth startups was that complexity penalties emerge as a result of success and limit growth unless they are dealt with. Stated another way, the more successful your system is, i.e. the more people want to use the service you created, the more you are forced to deal with your system operating outside of its design boundaries, resulting in increased incidents. Continuing our example above. If your service is successful, it will still be running when the MTBF is exceeded, likely resulting in a failure that you have to deal with. As Carlson and Doyle stated in their 2000 paper that highly optimized tolerance systems “are robust to perturbations they are designed to handle, yet fragile to unexpected perturbations and design flaws.”
A solution to software system brittleness is known as “graceful extensibility” and if you are interested Professor Woods has an interesting video lecture on this topic. The gist of the concept of graceful extensibility is stretching near and beyond boundaries, the opposite of brittleness, and trades off with the pursuit of optimality. An interesting point about this concept is that apparently no machines thus far possess this and yet all biological systems possess some amount of graceful extensibility. In fact, this notion that biological systems including humans possess this ability, allows us to help our software systems. Woods argues that systems as designed are more brittle than people realize but fail less often as people adapt to the shortfalls and “stretch” system performance - people are the ad hoc graceful extensibility of many software systems.
Ross Ashby’s study of cybernetics, circular causality or feedback, in the 1950s, led to his general theory of adaptive systems. In An Introduction to Cybernetics, Ashby used set cardinality, or variety, as a measure of information and formulated his Law of Requisite Variety stating that "only variety in [the regulator] can force down the variety due to [the source of disturbances]; only variety can destroy variety." A popular paraphrasing of the law is "only complexity absorbs complexity" but Woods extended this requisite variety to requisite revision stating, “only the ability to revise past answers to ‘what is requisite variety’ can produce future requisite variety” - we have to learn from past events to be poised to adapt.
Bringing this back to software system brittleness. No system, yet, has been designed without brittleness or with graceful extensibility. Humans fill in this gap for software systems and adapt to the shortfalls. In order for us to do this effectively and reduce the complexity, we must learn from previous events. Learning is key to building and maintaining great software systems and great companies.