Linode Status

Current Status
Platform Connectivity Issue - Newark (us-east) and Singapore (ap-south)
Incident Report for Linode
Postmortem

Starting on December 6, 2023 at around 22:00 UTC, Compute hosts in the Newark data center began registering elevated rates of failures in customer Linodes. The Singapore data center started registering the same boot failures starting on December 7, 2023 at around 07:00 UTC.

These failures did not generate any alerts. The Compute Support team was informed via phone call at 12:26 UTC, culminating in an escalation to our system administrators at 13:22 UTC. A preliminary initial investigation revealed a potential connection with an internal tool used by Compute hosts. This tool also appeared in connection with a similar circumstance that had occurred in our Dallas data center about two months prior. Based on these findings, the administrators reached out to the SMEs for the internal tool at 13:52 UTC.

The team responded at 14:39 UTC that the previous issue in Dallas was alleviated by adjusting the intranet networking configuration for Compute hosts. Starting at 15:11 UTC, Compute Support also engaged these SMEs and reported multiple customers experiencing problems with boots and other platform-level jobs, such as new Linode deployments and Cloud Firewall rule changes. The SMEs responded to this new information at 15:16 UTC and reported timeouts from this internal tool.

After some deliberation, an incident for this behavior was formally declared at 15:36 UTC. SMEs from multiple teams gathered to consider possibilities for the tool’s misbehavior, including the architecture of Compute’s central database and the host networking environment. A status page went live at 17:06 UTC.

By 17:27 UTC, the database architecture was ruled out as a culprit after extensive investigation. At 17:43 UTC, one of the SMEs noticed that disabling this tool altogether made it instantaneous to perform tasks related to the QEMU processes for Linodes, whereas leaving the tool enabled made operations take several seconds. Based on this observation, the SMEs decided to disable the tool across all hosts in Singapore and Newark.

This disabling task started in Singapore at 17:50 UTC and in Newark at 17:52 UTC, completing in Singapore at 18:20 UTC and in Newark at 18:28 UTC. Afterwards, the SMEs deployed a series of fifty Linodes each in both Singapore and Newark, reporting a 100% success rate for those boots at 18:39 UTC and 18:42 UTC respectively. A customer also reported a successful boot to Compute Support at 18:40 UTC. With these successes, the status page was moved to a monitoring state at 19:05 UTC, then resolved at 20:24 UTC.

The disabled tool has been safely re-enabled in Singapore and Newark. Our administrators are investigating improved monitoring of the impacted services as well as architectural changes to remove the possibility of similar problems recurring.

Posted Apr 29, 2024 - 19:45 UTC

Resolved
We haven’t observed any additional issues with platform-level operations in our Newark or Singapore data centers, and will now consider this incident resolved. If you continue to experience problems, please open a Support ticket for assistance.
Posted Dec 07, 2023 - 20:24 UTC
Monitoring
A fix has been implemented and we are monitoring the results. We will continue to monitor this situation and will provide additional updates as they become available.

If you continue to experience problems, please submit a support ticket for assistance.
Posted Dec 07, 2023 - 19:05 UTC
Investigating
Our team is investigating an issue affecting platform-level operations in our Newark (us-east) and Singapore (ap-south) data centers. During this time, users may experience issues when performing platform-level tasks in Newark or Singapore such as creating services, deleting services, and changing the power state of services. We will share additional updates as we have more information. If you require assistance, please open a Support ticket.
Posted Dec 07, 2023 - 17:05 UTC
This incident affected: Regions (US-East (Newark), AP-South (Singapore)).