Platform Outage
Incident Report for WorkOS
Postmortem

On 2023-04-11, from 18:14:48 to 18:33:20 UTC, several WorkOS products were unavailable and requests to the API encountered server errors.

We understand that WorkOS sits on a critical path for our customers’ applications. This is not a responsibility we take lightly and this outage is not in line with the level of service we aim to provide. We are taking all necessary steps to ensure an incident like this does not happen again.

Who was affected?

The incident affected incoming API requests with impact spanning many of our products. Affected requests during this time resulted in 500 and 503 responses.

What happened?

While performing maintenance, a networking configuration change was inadvertently applied. This change prevented our services from connecting to our storage services such as databases and caches.

The main factors that led to this incident were improper controls around testing and applying production networking changes.

What will we do to mitigate problems like this in the future?

Moving forward, WorkOS will take the following actions:

  1. Establish additional access control policies around applying configuration changes.
  2. Add checks for sensitive actions to production resources.
  3. Ensure infrastructure changes are managed consistently by infrastructure-as-code workflows.
Posted Apr 14, 2023 - 16:10 EDT

Resolved
The issue is now fixed and we are not seeing elevated errors in our systems
Posted Apr 11, 2023 - 14:55 EDT
Monitoring
We deployed a fix and are seeing our systems going back to normal.
Posted Apr 11, 2023 - 14:38 EDT
Identified
We identified the issue and are working on a fix. We will provide more updates soon.
Posted Apr 11, 2023 - 14:32 EDT
Update
We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.
Posted Apr 11, 2023 - 14:27 EDT
Investigating
We are investigating an issue with our API.

We apologize for the inconvenience and will share an update once we have more information.
Posted Apr 11, 2023 - 14:22 EDT
This incident affected: Supporting Services (Dashboard, Admin Portal).