Service Disruption - Delay in processing Syslog data in US2 cluster
Incident Report for Auvik Networks Inc.
Postmortem

Service Disruption - Delay in processing Syslog data in US2 cluster

Root Cause Analysis

Duration of incident

Discovered: Dec 12, 2022 18:10 - UTC
Resolved: Dec 13, 2022 01:35 - UTC

Cause

Syslog messages delayed due to disk space issues that blocked writes to the system

Effect

Syslog messages delayed up to 3.5 hours to the system. No data loss.

Action taken

All times in UTC

12/12/2023

18:13 - Internal alarm triggered for the Syslog message delays. Notification not sent to proper resource.
18:21 - Syslog begins to write again but processing messages has created a lag.
18:24 - Syslog message delay increases, triggering a second alarm to the proper resource this time.
18:26 - Engineering begins investigation.
19:15 - The system continues to fall further behind in message processing. Auvik posts to the status page about the ongoing incident.
19:26 - Auvik determines why Syslog message writes are periodically failing and takes steps to increase resources required by the Syslog service.
21:43 Syslog messages have caught up to where they should be. Customer impact has ended.

12/13/2022

01:35 Older resources are removed from the system and clean up is concluded.

Future consideration(s)

● Auvik has created improved lag time metrics and measurements to estimate Syslog lag for its customers
● Auvik has reviewed its internal alerting and rectified missing communication channels

Posted Dec 20, 2022 - 05:51 EST

Resolved
The fix for the Syslog data processing issue has been deployed. The source of the disruption has been resolved, and services have been fully restored.
Posted Dec 09, 2022 - 16:43 EST
Monitoring
We’ve identified the source of the service disruption affecting the processing of Syslog data in the US2 cluster and are monitoring the situation. We’ll keep you posted on a resolution.
Posted Dec 09, 2022 - 15:50 EST
Investigating
We’re experiencing disruption to Syslog data processing in US2. We will continue to provide updates as they become available.
Posted Dec 09, 2022 - 14:50 EST
This incident affected: Network Mgmt (us2.my.auvik.com).