Beneficial Intelligence

Undetected Downtime

September 25, 2020 Sten Vesterli Season 1 Episode 14
Beneficial Intelligence
Undetected Downtime
Show Notes

In this episode of Beneficial Intelligence, I discuss undetected downtime. 

Undetected downtime is one of the deadly sins of IT. If the business has to tell us that one of our systems is down, and we don't know, it undermines their trust in us. 

The infrastructure behind the Danish contact tracing app was down this week, and red-faced health officials had to admit that it had been down for days before news reports forced them to take the end-user reports of downtime seriously. 

A reason that this happens is that we might neglect to build monitoring capability into custom-built components in our environment. But that should not cause undetected downtime. We can monitor all the standard components like databases. We normally only monitor if the load is too high, but we should also monitor if the load is too low. If suddenly nobody is accessing the database, we have something we should investigate. 

As a CIO, you should ask your operations team if they are monitoring both ends of the healthy interval for various systems. If they are, you can avoid undetected downtime. 
---
Beneficial Intelligence is a weekly podcast with stories and pragmatic advice for CIOs and other IT leaders. To get in touch, please contact me at sten@vesterli.com