Tuning alarms

from blog Forkcasting, | ↗ original
When the on-call's pager goes off, they should be able to react quickly. If the system has a lot of false alarms, they may waste time checking if there is a real problem. If there are too many false alarms, the on-call may need prioritize some pages and ignore others. This lowers system availability as the time-to-repair increases. There's also...