Starting at 4:30 PM ET on October 29th until 5:25 PM ET Live API, Answers Serving, and Pages Locators had degraded service. Many requests failed and many others were slower or had incomplete results.
A failure in our service location layer in our US East service region led to a complete failure of the region. Service location is the layer of our infrastructure that allows our services to find and connect to each other. Our monitoring promptly alerted us to the issue, and we failed all serving over to our other regions. This led to some improvement, but those regions were not provisioned enough initially to handle the increased load. The issue was ultimately resolved by increasing provisioning in the alternative regions and resolving the underlying issue with service location.
We have already increased the provisioning both within the service location components in the US East region as well as the under provisioned components outside of the US East region. We will also regularly assess the capacity at all of our consumer serving sites to ensure we can operate successfully and immediately if any one of them were to fail. Over the next month we will be: