Partial Outage on FR-4

Incident Report for Platform.sh

Postmortem

Overview

On June 26th, at 08:36 UTC, an incident was declared for the (fr-4) region. Platform.sh monitoring detected a number of site down alerts and upon further investigation determined the issue was a result of maintenance being performed by the upstream service provider.

What Happened

The maintenance performed by the service provider required a reboot of several components in the region that caused production and development environments to become unavailable. The nature of this maintenance was related to routine security patches being rolled out across several hosts in the region.

After the reboot, several hosts did not recover as expected and required more intervention by Platform.sh engineers. The faulty hosts were replaced and projects were evacuated across to the good hosts for recovery to take place. A small percentage of projects did not recover as expected during the evacuation which resulted in a longer downtime for those said projects.

Platform.sh technical teams have fixed the bug that prevented hosts from recovering as expected and the same is being rolled out. Investigation on failures that occurred during evacuation are being carried out and will be fixed appropriately in future.

Impact

The incident impacted both production and development environments. Most projects were recovered by 11:53 UTC with a small portion of projects needing manual recovery by Platform.sh engineers. Observed downtime ranged from 60 minutes to 2 hours and 40 minutes.

Posted Jul 01, 2022 - 14:31 UTC

Resolved

This incident has been resolved.

Posted Jun 26, 2022 - 14:55 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Jun 26, 2022 - 11:53 UTC

Investigating

We have detected an issue affecting service on the FR-4 region. Our Operations team has been notified and is currently working to restore service. Projects in the affected region may experience site outages.
We will update you as soon as we have further information.

Posted Jun 26, 2022 - 08:36 UTC

This incident affected: Europe (France 4) (fr-4.platform.sh).