Issues accessing editor and/or live sites
Incident Report for Mono Solutions
Postmortem

Summary

Following a previous incident on the same day, a database instance at our back-ends in Frankfurt became unavailable. This rendered the back-ends to return 404s that would be subsequently cached at our CDNs. Once the issue was identified, the database was synced with the master and the back-end service was restored. There was a number of websites that required a cache clear in order for the 404s to become stale and get the correct content cached.

Impact

  • Editor inaccessible in Frankfurt
  • Non cached sites responding with 404s in Frankfurt

Root Causes

Redis slave instance in our VPC In Frankfurt was unresponsive.

Trigger

Master Redis instance was unavailable earlier.

Resolution

 - Manually restart the service.

Action Items

  • Alert when the Redis slave is not available or not in sync with master.

Lessons learned

  • We need better alerting for Redis availability issues.

Timeline GMT+2

2021/06/28 12:28 CET First support reports

2021/06/28 12:28 CET Engineering engaged on the incident

2021/06/28 12:34 CET Incident announced on status page

2021/06/28 13:31 CET Root cause identified

2021/06/28 13:39 CET Back-end service restored

2021/06/28 13:46 CET Status Page updated

2021/06/28 13:46 CET Affected sites still need manual cache clearing upon reporting

2021/06/28 16:32 CET Incident resolved

Posted Jun 30, 2021 - 12:20 CEST

Resolved
This incident has been resolved.
Posted Jun 28, 2021 - 16:32 CEST
Update
We have had networking issues that made our Redis master slave replication fail. The issues is now identified and fixed, we are in the process of updating missing data entries
Posted Jun 28, 2021 - 14:04 CEST
Monitoring
A fix has been implemented and we are monitoring the service. Both sites and editor should be becoming operational again.
We will provide postmortem once all has been resolved.
Posted Jun 28, 2021 - 13:46 CEST
Investigating
We are currently investigating this issue.
Posted Jun 28, 2021 - 12:34 CEST
This incident affected: Website and Editor.