[K8S] CoreDNS timeouts
Incident Report for Scaleway
Resolved
The DNS resolution failure has been resolved. To fix the issue, please proceed as follows:
Replace or reboot your nodes to get the fix.
Use a fqdn to resolve your addresses inside the Private Network. Meaning if you have a foo service in the mypn pn foo will not be resolved and you must use foo.mypn
Important notes:
Please avoid using an existing TLD (https://en.wikipedia.org/wiki/Top-level_domain) as PN name if you do so you can have some problem. For instance, if you name your pn fr you will not be able to resolve google.fr externally, meaning if you have not a service named google in your PN you won’t revsolv google.fr and if you have one, it will not be the good one.
Please note that prod and dev are TLD (https://data.iana.org/TLD/tlds-alpha-by-domain.txt)”
Posted Feb 20, 2024 - 16:49 CET
Update
We are still working on a permanent fix for this issue.
The issue can be temporarily mitigated by replacing forward . /etc/resolv.conf by forward . 169.254.169.254 for clusters using private networks and forward . [public resolver of choice] for multicloud and legacy public clusters in the ConfigMap kube-system/coredns
Posted Feb 16, 2024 - 09:30 CET
Update
We are continuing to investigate this issue.
Posted Feb 15, 2024 - 11:19 CET
Investigating
Some nodes experience up to 10% DNS resolution failures with I/O timeouts to 169.254.53.53:53.

The issue is identified, we are working on a fix
Posted Feb 12, 2024 - 16:00 CET
This incident affected: Elements - AZ (fr-par-1, fr-par-2, fr-par-3, nl-ams-1, pl-waw-1, nl-ams-2, pl-waw-2, nl-ams-3, pl-waw-3) and Elements - Products (Kapsule).