Downtime due to internet connectivity issues
Incident Report for CM.com
Postmortem

On June 14 we experienced an outage due to a bug in 4 of the 6 hardware switches in the core of the CM Cloud Network.

It triggered these 4 network components to lose connection with our cloud nodes in 2 separate data centers at the same moment.

Our CM Cloud Network is designed triple redundant, but loosing virtual servers in 2 of our 3 core data locations at the same time caused disruptions to some of our geo redundant database clusters.

Our Network Operation Center detected the connectivity issue within minutes and escalated the case in minutes.

After the switches restarted, the CM Cloud Infrastructure was fully up and running again within 45 minutes.

After 1 hours the majority of our services were fully operational again.

 

Timeline (CEST):

15:21 - Connectivity issues detected

16:24 – First services recovering

16:27 – Majority of services operational

20:31 – All services operational

 

What’s next?

Together with our supplier we conduct investigation on the failure of the network switches.

In the meantime we take extra measures within our DNS and Database clusters to improve resilience and speed up recovery time.

 This outage touched most of our services and we’re sorry for the impact to our customers and everyone who relies on them.

Posted Jun 15, 2021 - 16:59 CEST

Resolved
This incident has been resolved.
Posted Jun 14, 2021 - 23:30 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 23:21 CEST
Update
All issues are identified and our is team is currently working to get the last services back online. There are still some WhatsApp numbers that are experiencing issues and we expect them to be available soon. And will report back when this is the case. All others services are available.
Posted Jun 14, 2021 - 20:29 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 18:03 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 17:57 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 17:55 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 17:53 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 17:01 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 16:57 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 16:42 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 16:34 CEST
Update
We are continuing to monitor for any further issues.
Posted Jun 14, 2021 - 16:33 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 14, 2021 - 16:32 CEST
Update
We are continuing to investigate this issue.
Posted Jun 14, 2021 - 16:28 CEST
Update
We are continuing to investigate this issue.
Posted Jun 14, 2021 - 16:27 CEST
Investigating
We are investigating an issue with our internet connectivity
Posted Jun 14, 2021 - 15:44 CEST
This incident affected: Verification & Sign (Sign, iDIN), Ticketing (Seated Ticketing, General Admission), Mobile Service Cloud (Customer Contact), Online Payments (iDEAL QR), and Messaging (WhatsApp Business).