Service disruption
Incident Report for Auvik Networks Inc.
Postmortem

Monitoring Disruption - Cisco SNMPv3 AES-265 Data Unavailable

Duration of incident

Discovered: Jun 9, 2021 - UTC 14:56
Resolved: Jun 10, 2021 - UTC 23:00

Cause

Auvik released a collector update in order to move to a newer version of OpenSSL 1.1.1, which caused the SNMP library to be updated. Cisco devices use a special version of SNMPv3 which broke while updating the library. Effect The Auvik Collector no longer could authenticate against Cisco equipment that was set up to communicate using the SNMPv3 Protocol with AES-265 encryption credentials. This disabled the monitoring of Cisco devices.

Action taken

06/05/2021 1:00 UTC - Collector release version 2021-W22 was deployed to all Auvik
clusters.

06/09/2021 14:56 UTC - The ticket was submitted to the engineering team. It looked like SNMPv3 credentials with privacy protocol AES-256 stopped working after the 2021-W22 release.

06/09/2021 15:00 UTC - Auvik engineering team begins investigation into communication issues between collector and Cisco devices using SNMPv3 credentials with AES-265 encryption.

06/10/2021 03:30 UTC - Auvik rolls back changes to re-establish communication with Cisco devices using SNMPv3 AES-265 authentication.

06/10/2021 11:00 UTC - The inspection of data following the rollback indicated that not all SNMPv3 AES-256 devices were gathering data.

06/10/2021 15:00 UTC Investigation into the issue revealed that stale data was cached within the Auvik system and authentication could not succeed against Cisco SNMPv3 AES-256 devices.

06/10/2021 18:00 UTC - Auvik develops a script to flush out stale credentials from rolled back collectors to re-establish the communication.

06/10/2021 23:00 UTC - Auvik engineering team cleans up the stale data from SNMP credentials to resolve remaining issues with communication.

Future Considerations

  • Better identify customization by vendors for commonly used protocols and services
  • Add a more diverse group of vendors to the beta testing for better discovery of possible issues
  • Build in alerts for devices that monitored correctly and incorrectly after a maintenance window
Posted Oct 25, 2021 - 14:27 EDT

Resolved
The source of the disruption has been resolved, and services have been fully restored
Posted Jun 11, 2021 - 01:15 EDT
Monitoring
We are recovering from AWS connectivity issues in the eu2 region. We will continue to monitor the situation.
Posted Jun 11, 2021 - 00:55 EDT
Investigating
Auvik will undergo preventative maintenance. The session will take about one hour. During this time, you may not be able to log into Auvik. There may also be interruptions to your network monitoring.
Posted Jun 11, 2021 - 00:24 EDT
Monitoring
We are recovering from AWS connectivity issues in the eu2 region. We will continue to monitor the situation.
Posted Jun 10, 2021 - 18:25 EDT
Update
We are currently being impacted by AWS connectivity issues for our service instances within the eu2 region.
Posted Jun 10, 2021 - 17:35 EDT
Update
We are continuing to work on a fix for this issue.
Posted Jun 10, 2021 - 17:10 EDT
Identified
We’ve identified the source of the service disruption and are working as quickly as possible to restore service.
Posted Jun 10, 2021 - 17:03 EDT
Update
We are continuing to investigate this issue.
Posted Jun 10, 2021 - 17:01 EDT
Investigating
We’re experiencing a disruption to Auvik service. You may not be able to log into your Auvik dashboard. We’ll keep you posted on a resolution.
Posted Jun 10, 2021 - 16:45 EDT
This incident affected: Network Mgmt (eu2.my.auvik.com).