Upon releasing a new version of the Spreedly API, the automated code deployment process failed to properly deploy all resources needed by the application, resulting in an outage of the Spreedly API. A majority of customers were unable to process requests for approximately 30 minutes.
On April 7th, 2021 Spreedly released a new version of the application through an automated code deployment process. Soon after, Spreedly’s internal monitoring systems detected an elevated number of errors starting at 15:45 UTC. Spreedly engineers redeployed application instances, resolving the system issue. Impacted requests were those that received a “502 Gateway Unreachable” with a smaller number of customer requests receiving a “500 Internal Server Error” response.
Engineers continued to monitor and discovered a secondary issue as a by-product of the automated deployment process failure. A large volume of monitoring events overwhelmed a downstream service, resulting in degraded performance for a smaller subset of customer requests between 16:28 to 16:36 UTC. Additional action was taken to recycle the application. Impacted customers received a “500 Internal Server Error” response.
At approximately 17:10 UTC, Spreedly engineers released an update to the deployment process which addressed the internal automated deployment process. As a result, internal systems indicated signs of a return to normal activity.