Linode Status

Current Status
Connectivity Issue - (EU-Central) Frankfurt
Incident Report for Linode
Postmortem

Between 13:22 UTC and 21:10 UTC on May 31st, 2023 customers within the Frankfurt data center (EU-Central Region) experienced network instability due to a hardware failure on one of our redundant border routers. The nature of the hardware failure triggered a bug on the edge router that allowed traffic to continue to forward to an interface that was physically down, causing any flows transiting that interface to blackhole.

Initial alerting indicated a loss of edge capacity, the network team began troubleshooting, and identified a failed line card in a border router. Attempts to resolve this issue by restarting the line card, and then resetting the line were unsuccessful in bringing it back online. Determining that the initial network instability was the result of the failed hardware, standard procedures and steps to offline the line card, prepare for RMA, as well as preventive measures for interfaces affected, were implemented.

However, while most indicated recovery, some reports of packet loss remained and the team began troubleshooting once again, attempting to locate the source of the packet loss; a discrepancy was discovered between the FIB and RIB on the impacted border router, showing next hops for an interface which was physically down and belonged to the failed card. The individual interface is one of many links between the border routers and the core or spine routers. Due to intrinsic load balancing of traffic to the core, the overall impact was minimal and difficult to track down, causing delays in full mitigation.

At 21:09 UTC, the Network Team drained all traffic between the affected border router and the core spine router, this action succeeded in removing stale destinations, further testing confirmed the loss was no longer present, the team re-added traffic to the all links that were not impacted by the initial hardware failure, restoring all redundancy and connectivity.

The root cause of the bug itself is still under investigation, and we are currently working with the switch vendor for a permanent resolution.  In the interim, we have adjusted our processes to ensure that future encounters of this nature are mitigated in a more fault-resilient manner.

Posted Jun 05, 2023 - 19:19 UTC

Resolved
We haven’t observed any additional connectivity issues. We will now consider this incident resolved. If you experience issues please open a Support ticket for assistance.
Posted Jun 01, 2023 - 01:41 UTC
Monitoring
We have identified the issue and implemented a fix for it. We will now monitor the situation to ensure connectivity remains stable.
Posted May 31, 2023 - 21:47 UTC
Investigating
We are currently investigating this issue.
Posted May 31, 2023 - 18:43 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 31, 2023 - 17:03 UTC
Identified
Our team has identified the issue and are working to implement a fix. We will provide an update once the solution is in place.
Posted May 31, 2023 - 15:10 UTC
Investigating
Our team is investigating an issue affecting connectivity in our Frankfurt data center. During this time, users may experience intermittent connection timeouts and errors for services deployed in this data center. We will share additional updates as we have more information.
Posted May 31, 2023 - 14:17 UTC
This incident affected: Regions (EU-Central (Frankfurt)).