On the 16th of June 2022 at ~09:17 AEST, Squiz monitoring systems detected a degradation of service affecting some customers hosted on our SaaS platform. Users may have received an error page from the Cloudflare Content Delivery Network (CDN) related to not being able to resolve the origin DNS, occurring for all requests for uncached content.
Investigations performed by our Platform team indicated issues with a few custom hostname records that were pointing to an incorrect Application Load Balancer (ALB) instance. The affected hostname records were manually edited and updated to point to the right staging domains, promoting recovery at 10:12 AEST.
For the duration of the incident, users may have received an error page (Error 1016) from the Cloudflare Content Delivery Network (CDN) with an “error 530 HTTP status code” message related to not being able to resolve the origin DNS, occurring for all requests for uncached content.
A deployment to the production instance created corrupted custom hostname records in Cloudflare. The Squiz Platform team identified the affected hostnames and manually updated them to point at the correct domains.
In response to this Incident, the Squiz Platform team will undertake the following actions: