Infrastructure Issue Affecting Customer Sites
Incident Report for Pantheon Operations
Postmortem

Beginning Tuesday October 13, our distributed file system exhibited incorrect behavior for a small number of websites. The change was detected during the initial rollout to 2 filesystem clusters - 1 in the US and 1 in Australia. The deployment was paused before rollout to the remaining filesystem clusters. This incident lasted from 1415 UTC to 2147 UTC for the US cluster and from 1448 UTC to 1347 UTC the following day for the Australia cluster. Analysis of internal logs show no more than 0.25% of sites experienced this behavior over the course of the incident.  

This issue was apparent when newly created files were reported  to not exist (404) on containers other than the container where the file was written. The root cause was a configuration change. A fix has been implemented and we are adding tests to catch this type of bug earlier in our deployment pipeline.

We apologize for any inconvenience resulting from this issue.

Posted Oct 14, 2020 - 11:43 PDT

Resolved
This incident has been resolved.
Posted Oct 13, 2020 - 15:59 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 13, 2020 - 15:08 PDT
Identified
We are addressing an infrastructure failure that is affecting a small portion of customer sites.
Posted Oct 13, 2020 - 14:06 PDT
This incident affected: Customer Sites.