Overview
On June 26th, at 08:36 UTC, an incident was declared for the (fr-4) region. Platform.sh monitoring detected a number of site down alerts and upon further investigation determined the issue was a result of maintenance being performed by the upstream service provider.
What Happened
The maintenance performed by the service provider required a reboot of several components in the region that caused production and development environments to become unavailable. The nature of this maintenance was related to routine security patches being rolled out across several hosts in the region.
After the reboot, several hosts did not recover as expected and required more intervention by Platform.sh engineers. The faulty hosts were replaced and projects were evacuated across to the good hosts for recovery to take place. A small percentage of projects did not recover as expected during the evacuation which resulted in a longer downtime for those said projects.
Platform.sh technical teams have fixed the bug that prevented hosts from recovering as expected and the same is being rolled out. Investigation on failures that occurred during evacuation are being carried out and will be fixed appropriately in future.
Impact
The incident impacted both production and development environments. Most projects were recovered by 11:53 UTC with a small portion of projects needing manual recovery by Platform.sh engineers. Observed downtime ranged from 60 minutes to 2 hours and 40 minutes.