Service Disruption - Degraded Custom SNMP Polling and Device identification
Incident Report for Auvik Networks Inc.
Postmortem

Service Disruption - SNMP Poller and Identification Service Degraded

Root Cause Analysis

Duration of incident

Discovered: Nov 7, 2022, 15:43 - UTC
Resolved: Nov 7, 2022, 20:00 - UTC

Cause

An update to the identification system triggered an undiscovered bug.

Effect

A previously undiscovered bug in the code reacted to the deployment causing resource issues for the SNMP poller and Identification services. As a result, some of these systems were degraded

Action taken

11/07/2022

15:34 UTC - An update was pushed to the identification system.
16:03 UTC - First Internal alarm fires.
16:47 UTC - Engineering team meeting to investigate the incident.
16:55 UTC - Status page post for investigation of SNMP polling and device Identification degradation of services.
17:35 UTC - Engineering team adds additional resources to clusters to address resource issues.
17:45 UTC - Engineering makes the decision to revert the update to the previous settings. Begins the process of reverting.
18:14 UTC - Engineering completes the reverting of the update and monitors systems.
20:00 UTC - Incident is marked as resolved and closed.

Future consideration(s)

● Auvik will enforce rigorous QA practices for change deployment to avoid future occurrences.
● Adjust Auto Scaling of resources in this area of the product.
● Add more pointed change alerting related to this specific type of incident to replace the generic one that is currently in place.
● Review current tech debt in this area and priorities in terms of impact.

Posted Nov 18, 2022 - 07:54 EST

Resolved
The fix for SNMP Polling and Device Identification degradation has been implemented. The source of the disruption has been resolved, and services have been fully restored.

A Root Cause Analysis (RCA) will follow after completing a full review.
Posted Nov 07, 2022 - 14:44 EST
Monitoring
We’ve identified the source of the service disruption with SNMP Polling and Device Identification and are monitoring the situation. Tenants may have experienced service degradation with SNMP and device identification. The fix has been implemented. We’ll keep you posted on a resolution.
Posted Nov 07, 2022 - 13:47 EST
Identified
We’ve identified the source of the service disruption with SNMP Polling and Device Identification. Tenants may experience service degradation with SNMP and device identification. We are working to restore service as quickly as possible.
Posted Nov 07, 2022 - 13:01 EST
Investigating
We’re experiencing disruption to SNMP Polling and Device Identification. Tenants may experience service degradation with SNMP and device identification. We will continue to provide updates as they become available.
Posted Nov 07, 2022 - 12:55 EST
This incident affected: Network Mgmt (my.auvik.com, us1.my.auvik.com, us2.my.auvik.com, us3.my.auvik.com, us4.my.auvik.com, eu1.my.auvik.com, eu2.my.auvik.com, au1.my.auvik.com, ca1.my.auvik.com).