Summary

On February 17, beginning at 7:33 p.m. ET, a database which serves data to the Customer Portal, Platform API, and the Knowledge Network experienced a critical hardware failure. This resulted in elevated error rates throughout the system. Engineers initiated a failover to the replica instance and restored all services at 7:44 p.m. ET.

Remediation

Some services which had a caching layer were able to continue serving requests during the incident. We will be exploring additional caching options in order to further improve our time to resolution and minimize impact from hardware failures.

Posted Mar 01, 2021 - 13:00 EST

Resolved

This incident has been resolved.

Posted Feb 17, 2021 - 20:40 EST

Monitoring

A fix has been implemented and we will monitor for any regressions.

Posted Feb 17, 2021 - 19:54 EST

Investigating

We are currently investigating elevated error rates in the Customer Portal. We will update as soon as we have more information.

Posted Feb 17, 2021 - 19:45 EST

This incident affected: Listings (Listings Publishing), Content (Management API), and Customer Portal Login.