On January 6th, 2021, at 11:23 a.m. ET, a server which processed logs for the Sandbox Answers Overview page stopped functioning. This prevented some portions of the overview page from displaying the latest data, which also impacted Hitchhiker training. Engineers restored service at 12:45 p.m. ET. No data was lost as a result of this outage, and production services were unaffected.
A downstream service had an incorrect configuration which caused it to run out of resources, halting the data processing pipeline. Once the service was restored, processing resumed and completed the backlog of logs.
We will be updating our configuration delivery mechanism to remove the incorrectly provisioned settings. Additionally, future configuration updates will require manual user review to reduce the likelihood of errors.