Root Cause Analysis
Discovered: Dec 12, 2022 18:10 - UTC
Resolved: Dec 13, 2022 01:35 - UTC
Syslog messages delayed due to disk space issues that blocked writes to the system
Syslog messages delayed up to 3.5 hours to the system. No data loss.
Action taken
All times in UTC
12/12/2023
18:13 - Internal alarm triggered for the Syslog message delays. Notification not sent to proper resource.
18:21 - Syslog begins to write again but processing messages has created a lag.
18:24 - Syslog message delay increases, triggering a second alarm to the proper resource this time.
18:26 - Engineering begins investigation.
19:15 - The system continues to fall further behind in message processing. Auvik posts to the status page about the ongoing incident.
19:26 - Auvik determines why Syslog message writes are periodically failing and takes steps to increase resources required by the Syslog service.
21:43 Syslog messages have caught up to where they should be. Customer impact has ended.
12/13/2022
01:35 Older resources are removed from the system and clean up is concluded.
● Auvik has created improved lag time metrics and measurements to estimate Syslog lag for its customers
● Auvik has reviewed its internal alerting and rectified missing communication channels