Incident Summary
On Friday, November 26th, there was a spike in traffic in the AMS1 data center. By Sunday, November 28th, traffic reached a level that caused an increase in intermittent 502, 503, and 504 error responses for API requests to some services.
Scope of Impact
During the incident window some API services returned intermittent 502, 503, and 504 errors. These errors caused workflow disruptions and required some API calls be made multiple times to get a 200 response.
Timeline (UTC)
2021-11-26 22:00:00: Incident Started
2021-11-28 10:00:00: Issue was reported and investigation begun, issue was originally thought to be related another open incident
2021-11-28 15:52:00: Issue was determined to be unrelated to the other incident and a new incident was created
2021-11-28 16:09:00: Internal escalation to engineering
2021-11-29 2:36:00: Cause of incident discovered and addressed
2021-11-29 9:00:00: Incident resolved
Cause Analysis
This increase was the result of a reoccurring workflow being run by 14 user seats instead of the single instance previously used. This caused an overload that resulted in the intermittent 502, 503, and 504 errors.
Resolution Steps
The problem workflow was reverted, and a new load balancing solution was implemented to ease the increased traffic.
Next Steps
The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.
We have patched the issue and are monitoring our systems closely. We will provide an update as soon as the issue has been fully resolved.
We are currently investigating the following issue:
We will provide an update as soon as more information is available. Thank you for your patience.