On December 10, 2023 at 13:20 UTC, we became aware of a disruption to the Object Storage service in our Newark Datacenter. We located the source of activity that was negatively impacting the cluster, and at 13:45 UTC, took initial steps to mitigate the issue. By 15:20 UTC, we performed restarts of the affected infrastructure in order to return the service to a stable state, and proceeded into a monitoring period at 15:51 UTC.
Along with monitoring the service, at 16:33 UTC, our team took additional action to make backend adjustments that would address specific endpoints. Though this did not negatively impact the service immediately, we observed strain on the underlying infrastructure by 17:39 UTC. Our administrators continued to investigate further.
At 18:59 UTC, our team began to address the state of the infrastructure by upgrading the resources dedicated to serving Object Storage requests. These upgrades were done in sequential order with a focus on limiting any additional pressure to the cluster. At 20:34 UTC, it was determined that the steps taken thus far had sufficiently addressed the issue. We continued to monitor the situation, and by 00:35 UTC, we felt confident in considering the incident resolved.
We have reviewed the steps taken in the effort to address the repercussions caused by the problematic activity on Object Storage in our Newark Datacenter. We have proactively applied the backend adjustments to other data centers, in hopes to provide further resilience against future impacts.