On February 17, beginning at 7:33 p.m. ET, a database which serves data to the Customer Portal, Platform API, and the Knowledge Network experienced a critical hardware failure. This resulted in elevated error rates throughout the system. Engineers initiated a failover to the replica instance and restored all services at 7:44 p.m. ET.
Some services which had a caching layer were able to continue serving requests during the incident. We will be exploring additional caching options in order to further improve our time to resolution and minimize impact from hardware failures.