DUO3: Authentication failures
Incident Report for Duo
Postmortem

Summary

On September 29, 2023, at around 6:42 pm EST, Duo's Engineering Team was alerted by monitoring that we lost authentication to the Duo Applications and the Duo Admin Panel for customers on the DUO3 deployment. The root cause was identified as failure on one of our scheduled maintenance tasks.

The issue was resolved on the same day by 6:49 pm EST.

Deployments Impacted

  • DUO3

Timeline of Events EST

2023-09-29 

18:42 Duo Site Reliability Engineering (SRE) is informed by Duo internal monitoring that a scheduled maintenance task fails. 

18:49 Duo SRE team immediately mitigated the failed task and restored functionality.

18:49 Duo SRE started monitoring to ensure we had mitigated the issue.

18:53 Duo SRE team validated through automated monitoring that we had restored complete functionality.

18:59 Duo SRE team worked with our TSE team to provide communication to our customers, validating that only DUO3 was impacted for 7 minutes. Status Page Updated.

19:19 Status Page Updated to: “We have confirmed that authentication services are back to fully operational and this issue is resolved. We will provide a Root Cause Analysis (RCA) as soon as it is available.”

Details

DUO3 has multiple redundant load balancer pairs that accept requests from the internet and distribute them to applications. Within each pair, one half actively processes requests and the other acts as a passive hot spare.

Duo SRE runs scheduled maintenance after hours for our Load Balancer inventory. While conducting scheduled maintenance one of our tasks failed. As soon as the SRE team noticed, the failed task was quickly updated to restore service. It took 7 minutes for the team to restore service.

Duo SRE team is dedicated to providing reliable service to all users. The  Duo SRE team has completed a retrospective to determine steps and actions to avoid similar incidents in the future.

Note: You can find your Duo deployment’s ID and sign up for updates via the StatusPage by following the instructions in this knowledge base article.

Posted Oct 04, 2023 - 13:24 EDT

Resolved
We have confirmed that authentication services are back to fully operational and this issue is resolved. We will provide a Root Cause Analysis (RCA) as soon as it is available.
Posted Sep 29, 2023 - 19:20 EDT
Monitoring
We have identified an issue that resulted in a period of approximately 9 minutes where authentication to Duo Applications and the Duo Admin Panel for customers on the DUO3 deployment would fail, and have deployed a fix. Authentication services are now fully functional. We are continuing to monitor to ensure that no further issues happen, and will provide more information as soon as it is available.
Posted Sep 29, 2023 - 19:00 EDT
This incident affected: DUO3 (Core Authentication Service, Admin Panel, Push Delivery, Phone Call Delivery, SMS Message Delivery, Cloud PKI, SSO).