Root Cause Analysis
Discovered: Nov 7, 2022, 15:43 - UTC
Resolved: Nov 7, 2022, 20:00 - UTC
An update to the identification system triggered an undiscovered bug.
Effect
A previously undiscovered bug in the code reacted to the deployment causing resource issues for the SNMP poller and Identification services. As a result, some of these systems were degraded
11/07/2022
15:34 UTC - An update was pushed to the identification system.
16:03 UTC - First Internal alarm fires.
16:47 UTC - Engineering team meeting to investigate the incident.
16:55 UTC - Status page post for investigation of SNMP polling and device Identification degradation of services.
17:35 UTC - Engineering team adds additional resources to clusters to address resource issues.
17:45 UTC - Engineering makes the decision to revert the update to the previous settings. Begins the process of reverting.
18:14 UTC - Engineering completes the reverting of the update and monitors systems.
20:00 UTC - Incident is marked as resolved and closed.
● Auvik will enforce rigorous QA practices for change deployment to avoid future occurrences.
● Adjust Auto Scaling of resources in this area of the product.
● Add more pointed change alerting related to this specific type of incident to replace the generic one that is currently in place.
● Review current tech debt in this area and priorities in terms of impact.