Unable to login to Dashboard via SSO
Incident Report for Onfido
Postmortem

Summary

After deploying a fix to address the dashboard login issue that occurred earlier in the day, a further attempt was made to switch to our new website host at 12:40 UTC, October 24th. Although most clients were subsequently able to access the dashboard, EU clients using SSO were unable to login. A series of fixes were applied, completing at 22:16 UTC, that led to the dashboard service being restored for SSO clients.

Root Causes

When launching our new website, an interdependent infrastructure component was not migrated from one hosting provider to the other. This component provided request routing functionality to our dashboard that enabled login behaviours.

Timeline

12:40 UTC: Having deployed a fix targeting the earlier problem with EU dashboard logins, we reapplied the change to switch to the new website host. This fix moved a EU Dashboard api endpoint to segregate it from the onfido.com scope, following a pattern used in our USA and Canada clusters.

13:00 UTC: We discovered customers using SSO were unable to log in to Dashboard.

13:00 - 15:00 UTC: Investigations performed to replicate and debug the problem.

15:00 - 22:00 UTC: A series of DNS and code changes were prepared, with the final enabling correct cookie creation for SSO with our reconfigured dashboard. In parallel, testing was set up and performed to replicate the problems being experienced and verify SSO flows as the changes were developed.

22:16 UTC: We applied the final fix required to restore SSO login behaviours.

Remedies

Our immediate actions were to remove or redirect dependencies between the dashboard application and onfido.com. This enabled SSO logins on the dashboard to work correctly with the new hosting configuration.

To avoid similar issues from recurring in the future, we are reviewing how we plan and test projects and changes of this nature. In particular, we are working on the following actions in the short term:

  1. Improving our visibility with additional monitoring of dashboard login functions. [ETA end Nov 2022]
  2. Review automated regression test scenarios around dashboard login, especially related to the SSO setup. [ETA mid Dec 2022]
  3. Simplify our setup for manual regression testing of dashboard SSO login, to reduce our response time in debugging related problems in future. [ETA mid Dec 2022]

Additionally, we have a mid-term initiative to remove all remaining dependencies between onfido.com and our production applications.

Posted Oct 25, 2022 - 16:49 UTC

Resolved
This incident has been resolved. A postmortem will follow in the next 48 hours.
Posted Oct 24, 2022 - 22:21 UTC
Update
A fix has been applied.
Posted Oct 24, 2022 - 22:17 UTC
Update
The problem is persisting for some clients. Investigations are ongoing.
Posted Oct 24, 2022 - 20:17 UTC
Update
We have applied a fix. We continue to monitor the situation.
Posted Oct 24, 2022 - 19:11 UTC
Update
We have identified the issue and are working on a fix. Please do not update your client configuration.
Posted Oct 24, 2022 - 17:57 UTC
Update
We are continuing to investigate this issue.
Posted Oct 24, 2022 - 16:25 UTC
Investigating
Customers in EU using SSO cannot login to the Dashboard at the moment.
We are investigating the issue.
Posted Oct 24, 2022 - 13:36 UTC
This incident affected: Europe (onfido.com) (Dashboard).