Increased Portal and SGN errors
Incident Report for Todyl
Postmortem

IMPACTED DATACENTER LOCATIONS:
ALL - PARTIAL

*TIME:
*
12:24 PM UTC to 13:21 PM UTC

ISSUE SUMMARY:
Starting at 12:24 PM UTC, one of our database servers was taken offline by an outage in the AWS USEAST region, impacting our Portal and APIs that run in AWS. The SGN does not run in AWS, and devices with an existing connection remained connected, however, configuration changes were not applied during the outage.

The AWS management portal was also impacted by the outage, delaying our attempts at recovery. Once our Site Reliability Engineering team was able to access the portal, a failover procedure was followed, restoring service, allowing devices to connect to the SGN, and restoring access to the Todyl Portal.

ADDITIONAL INFORMATION:
We are continuing to work with AWS to determine the cause of the outage on their end and reviewing our internal procedures to expedite failover and recovery if a similar issue were to reoccur in the future. In addition, we are exploring a multi-cloud strategy, to improve resilience against single provider outages.

Posted Dec 22, 2021 - 17:53 UTC

Resolved
This incident has been resolved.
Posted Dec 22, 2021 - 13:56 UTC
Monitoring
The issue has been resolved and devices are connecting to the SGN. We are continuing to monitor.
Posted Dec 22, 2021 - 13:21 UTC
Update
Devices may experience issues connecting to the SGN globally and errors may be seen in the portal. We have identified the issue and are rapidly working towards a resolution.
Posted Dec 22, 2021 - 13:02 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Dec 22, 2021 - 12:31 UTC
Investigating
We are currently investigating this issue.
Posted Dec 22, 2021 - 12:24 UTC
This incident affected: Todyl.com & Management Portal.