Between 8:00am and 10:30am UTC, February 9, 2023, sections of octopus.com intermittently returned 503 responses. The affected routes were /signin, /blogs, and /docs.
Octopus Deploy recently migrated our DNS management to a new provider to centralize our infrastructure.
During the migration, we set the web application firewall (WAF) in front of octopus.com to detection mode. At the same time, we tuned the ruleset to prevent false positives from blocking legitimate customer access to Octopus systems.
(All dates and times below are shown in UTC.)
08:05 Our automated systems detected decreased availability in sections of the octopus.com website.
08:35 Engineers on call were notified.
08:56 Status Page updated: An incident was declared.
10:30 We updated the WAF to block malicious traffic.
10:48 Status Page updated: Incident status changed to `Monitoring`.
12:24 Status Page updated: Incident status changed to `Resolved`.
An attacker ran a fuzzing application across our public-facing website during the time the WAF was in “detection” mode. This caused excessive load that would normally have been prevented by the WAF, in turn reducing availability of octopus.com.
Engineers mitigated the outage by applying a cut-down implementation of the WAF that protected the website from single origin attacks.
Since this incident, we've completed the migration to our new DNS provider, and the WAF is fully enabled.
During our incident review process, we identified and corrected gaps in our defense to reduce the time from detection to mitigation.
We identified the internal oversight in risk management that led to this situation: by mitigating one risk, we became susceptible to another risk. We have since updated our project risk assessment process to include more formal internal reviews of our planned changes to core systems.
Octopus Deploy takes service availability seriously. In the past month, we’ve had multiple incidents affecting sign-in infrastructure, which is below our desired standard. We apologize for the disruption to our customers and are working to reduce the likelihood and severity of future disruptions.