Elevated API Errors
Incident Report for Cloudsmith
Postmortem

Summary

On September 14th, between 17:22 and 18:18 UTC, traffic to our us-west-2 region experienced slower response times and eventually started returning timeout responses.

This was caused by instability with the underlying infrastructure for our database replica in the region causing significant slowdowns in queries being processed, leading to longer times to service requests, which caused multiple web server instances in the region to be replaced, amplifying the problem.

The situation was resolved by working with our infrastructure provider to route all us-west-2 traffic to our eu-west-1 region, route the problematic database replica in us-west-2, and then route the affected traffic back to us-west-2 after recovery.

Posted Sep 16, 2022 - 13:45 UTC

Resolved
This incident has been resolved.
Posted Sep 14, 2022 - 19:26 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 14, 2022 - 18:31 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 14, 2022 - 18:21 UTC
Investigating
We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted Sep 14, 2022 - 17:40 UTC
This incident affected: Frontend REST API, Frontend Website, Backend Databases, Backend Processing, and Package Downloads (CDN).