The Reliability pillar focuses on ensuring a workload performs its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total life-cycle. Key topics include distributed system design, recovery planning, and how to handle change.
There are five design principles for Reliability in the cloud:
- Automatically recover from failure
- Test recovery procedures
- Scale horizontally to increase aggregate workload availability
- Stop guessing capacity
- Manage change in automation