The Cycle of Reactive Response

What is the right level of reliability for the system you support?

Development organizations prioritize building new features over improving the reliability of past ones and often there is little executive support for changing these priorities because the cost of unreliability is not immediately obvious. Monitoring and alerting systems that trigger operational response are only loosely connected to the overall experience of users leading to unnecessary operational work or worse, problems that go unnoticed until users complain. If your SLOs have executive backing and development teams that are committed to meeting them they turn drawn-out arguments about prioritization into data-driven decisions Even better an SLO can drive short-term operational response as well as long-term prioritization.