Partial Outage on EU-4

Incident Report for Platform.sh

Postmortem

Overview

On June 20th, at 09:24 UTC, an incident was declared for the (eu-4) region. Platform.sh monitoring detected a number of site down alerts and upon further investigation determined the issue was a result of gateway performance.

What Happened

At 09:08 UTC, monitoring alerts indicated a high volume of traffic entering the region. A mitigation plan to address it was quickly implemented. By 09:12 this traffic volume had been mitigated. We declared an incident to further review the issue and identify the root cause. Investigation determined a small number of sources being the cause of the traffic volume.

Platform.sh has plans in the roadmap to put in place appropriate rate limiting to prevent such traffic spikes from bringing down several sites in the region.

Impact

The incident was classified as a partial outage with most sites returning to normal operating levels by 09:17. The longest observed outage lasted a total of 14 minutes with most monitored sites only experiencing a 1 minute to 3 minute outage.

Posted Jul 01, 2022 - 15:11 UTC

Resolved

This incident has been resolved.

Posted Jun 20, 2022 - 14:51 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Jun 20, 2022 - 12:35 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Jun 20, 2022 - 09:45 UTC

Investigating

We have detected an issue affecting service on the EU-4 regions. Our Operations team has been notified and is currently working to restore service. Projects in this region may see timeouts or other outages on their production and development sites.

We will update you as soon as we have further information.

Posted Jun 20, 2022 - 09:24 UTC

This incident affected: Europe (West 4) (eu-4.platform.sh).