After deploying a fix to address the dashboard login issue that occurred earlier in the day, a further attempt was made to switch to our new website host at 12:40 UTC, October 24th. Although most clients were subsequently able to access the dashboard, EU clients using SSO were unable to login. A series of fixes were applied, completing at 22:16 UTC, that led to the dashboard service being restored for SSO clients.
When launching our new website, an interdependent infrastructure component was not migrated from one hosting provider to the other. This component provided request routing functionality to our dashboard that enabled login behaviours.
12:40 UTC: Having deployed a fix targeting the earlier problem with EU dashboard logins, we reapplied the change to switch to the new website host. This fix moved a EU Dashboard api endpoint to segregate it from the onfido.com scope, following a pattern used in our USA and Canada clusters.
13:00 UTC: We discovered customers using SSO were unable to log in to Dashboard.
13:00 - 15:00 UTC: Investigations performed to replicate and debug the problem.
15:00 - 22:00 UTC: A series of DNS and code changes were prepared, with the final enabling correct cookie creation for SSO with our reconfigured dashboard. In parallel, testing was set up and performed to replicate the problems being experienced and verify SSO flows as the changes were developed.
22:16 UTC: We applied the final fix required to restore SSO login behaviours.
Our immediate actions were to remove or redirect dependencies between the dashboard application and onfido.com. This enabled SSO logins on the dashboard to work correctly with the new hosting configuration.
To avoid similar issues from recurring in the future, we are reviewing how we plan and test projects and changes of this nature. In particular, we are working on the following actions in the short term:
Additionally, we have a mid-term initiative to remove all remaining dependencies between onfido.com and our production applications.