Resilience is the ability or capacity to quickly recover from difficulties. Systems that are resilient have less risk, a lower cost of ownership, and the ability to build upon the system. The CAP theorem of Consistency, Availability, and Partition applies to resilience within the domain of distributed systems but is instructive in regards to any system. The seven principles of resilience used in this episode were gleaned from the Stockholm Resilience Centre’s guidelines for general-purpose resilience. The first three involve systems design while the final four focus around team and organizational structure.
06:54 What is Resilience
“The ability of a substance or object to spring back into shape; elasticity”
Resilience is the capacity to quickly recover from difficulties or problems. The Stockholm Resilence Centre provides guidelines for building resilient systems. The first three principles are actual system designs with the latter being team and organizational structure. In essence, these will work for any chaotic system that needs to be managed.
09:12 Why Systems Should Be Resilient
Resilience lowers the cost of ownership in a system by reducing the risk. It creates the ability to build on top of the system.
11:32 The CAP Theorem
Applies to resilience within the domain of distributed systems and consists of three principles of which only two can be in effect at one time.
- Consistency: Everything is up to date with the latest version at all times
- Availability: A guarantee that every request receives a response.
- Partition Tolerance: The system continues to operate despite network failure.
14:23 7 Principles of Resilience
- Maintain Diversity and Redundancy
Gives partition tolerance, if a system fails in one area the failure does not cascade into another.
- Manage Connectivity
Well-connected systems can recover quickly, yet overly-connected systems fail quickly.
- Manage Slow Variables and Feedbacks
A slow variable is one whose influence does not scale linearly. With disk space going from a lot to enough will not noticeably change performance. However, going from enough to not enough degrades performance rapidly.
- Foster Complex Adaptive Systems Thinking
Give up the illusion of perfect control, meaning let things go to a certain level. This is more an attitude than a systems design.
- Encourage Learning
Another attitude over system design, systems will have to be adjusted from time to time.
- Broaden Participation
Get stakeholders more deeply involved in the project to eliminate surprises, whether by error, unknown requirements, or political maneuvering.
- Promote Polycentric Governance
Also known as multi-level or multi-party governance. It is an organizational structure where multiple participants order their structure based on a general set of rules. In software design polycentricity reduces risks when scaling applications and systems.
- Maintain Diversity and Redundancy
IoTease: March is for Makers
March is for Makers is a movement started by Saron Yitbarek of Code Newbie and Scott Hanselman of HanselMinutes for their respective podcasts. All month long they will be interviewing makers and discussing hardware. Though not officially part of the movement to show our support we are dedicating IoTease this month to fun family projects that can be done each week.
- 1 x Arduino UNO
- 1 x solderless breadboard
- 1 x standard LED
- 1 x 220 Ohm resistor
- 2 x jumper cables
Tricks of the Trade
A brief philosophical talk about applying scaling beyond code to your life. Scale by doing one small thing many times. It’s easier to write then edit than doing them at the same time. Start applying the way you write software to your life.
We’ve been working on improving audio quality and learning as we go. Occasionally you will hear the drive on BJ’s laptop in his recording. We are working on fixing this issue, BJ wants an SSD for his laptop but until then he will have to move it away from his mic.