On January 19th, 2021, beginning at 2:07 p.m. ET, the Photos, Answers, and Live API services in Sandbox began experiencing elevated error rates. Engineers were notified and began investigation. Mitigation measures were implemented by 4:05 p.m. ET, at which point error rates began returning to normal. All Sandbox services were fully restored by 5:18 p.m. ET.
No production services were disrupted during this time.
A routine operation to patch and upgrade server hardware failed to add the upgraded servers to the load balancers. Adding the new machines to the load balancers allowed backend services to resume normal operation.
We will be adding checks to our upgrade process to verify the correctness of load balancer changes.