We observed a novel traffic pattern that resulted in an unexpected increase in load on our auth services that created latency for customers using several services in the us-east-1 region. The root cause of this behavior was that the unique traffic pattern resulted in unforeseen interactions with configurations that prevented additional capacity from being utilized when it was brought online. We also uncovered a bug in our balancer configuration that limited the number of auth servers that could be utilized by the services, this likely exacerbated the issues we observed.
We resolved the bug in the balancer configuration and we have made several changes to the way that configurations are used in instances of the traffic patterns we observed and those like it. In addition, we have increased the base capacity of the auth services to ensure we do not see another increase in auth latency while we observe whether the fixes we made have the desired impact.