Major Incident - Squiz Cloud Customers US - 31st May 2022
Incident Report for Squiz
Postmortem

Executive Summary

On the 31st of May 2022 at 06:52 GMT, Squiz monitoring systems detected a degradation of service affecting customers hosted in our Sacramento and New York data centres. Investigation by the Squiz Data Centre team indicated a Distributed Denial of Service (DDoS) attack on a customer’s search website. Other customers hosted in Sacramento and New York data centres may have experienced packet loss and elevated response times resulting in intermittent degradation of service. 

Once the issue was identified, the Squiz Data Centre team took remedial action to contain the attack by blocking the DDoS attack via DNS Blackholing, rerouting the incoming network traffic on the impacted website, resulting in partial stability. Recovery was achieved with changes to Firewall and Web Application Firewall rules at ~ 11:58 GMT.

 

Customer Impact

For the duration of the incident the targeted customers search website was disrupted and other customers in the Sacramento and New York data centres experienced increased response times and sporadic unreachability of services.

 

Root Cause

An application layer Distributed Denial of Service (DDoS) attack was launched on one of our customer sites causing a total disruption of service, and further impacting other customers due to being hosted in the same environments. The attack used a payload looking like normal web traffic that was not initially detected and blocked by security measures.The initial observable symptoms looked like normal internal traffic fluctuations with the initial small amount of packet loss.

 

Containment and Recovery

The attack against the targeted customer was contained by DNS Blackholing all traffic directed at the target, restoring normal operations to the data centres. 

The targeted customer was recovered by  updates to the Firewall and Web Application Firewall rules to recognise the attacker’s behaviour and block further attacks using this tactic.

 

Mitigation and Follow-up Actions

In response to this Incident, the Squiz Data Centre team will undertake the following actions:

  • Review current security measures to improve capability in preventing higher layer attacks.
  • Validate the existing firewall rules across our data centres to ensure overall application and environment stability. 
  • Improve visibility of traffic monitoring across our data centres.
  • Validate and increase alerts for Firewalls to monitor load spikes.
  • Migrate VM Gateways from stateful firewall to the border routers, ensuring enhanced security is in place.

If you require a PDF copy of this post incident report please contact your Squiz Service Experience Manager or Squiz Customer Care.

Posted Jun 08, 2022 - 15:13 AEST

Resolved
Squiz teams have deployed a fix for the current Major Incident which has restored service for affected Squiz Cloud hosted customers. We apologise for this degradation of service and thank you for your patience while we worked on the resolution.

A postmortem will be provided via https://status.squiz.cloud .
Posted May 31, 2022 - 21:03 AEST
Monitoring
Squiz hosting team have rerouted the incoming network traffic for our Sacramento and New York Data Centre resulting in stability. Additionally, firewall changes have been performed by our external provider to aid in recovery.
Posted May 31, 2022 - 18:48 AEST
Update
Squiz hosting team continues to investigate the degraded services for our customers hosted in our Sacramento and New York Data Centre. Discussions are ongoing with our external provider.

A further update will be provided in ~10 minutes.
Posted May 31, 2022 - 18:01 AEST
Update
Squiz continue to investigate a degradation of service for Squiz Cloud customers in the US.

A further update will be provided in ~15 minutes.
Posted May 31, 2022 - 17:47 AEST
Update
Squiz continue to investigate a degradation of service for Squiz Cloud customers in the US.

A further update will be provided in ~15 minutes.
Posted May 31, 2022 - 17:27 AEST
Investigating
Squiz monitoring has detected a degradation of service incident that is affecting Squiz Cloud customers hosted in the USA ONLY. Multiple Squiz teams are currently investigating.

A further update will be provided in ~15 minutes.
Posted May 31, 2022 - 17:05 AEST
This incident affected: Squiz Cloud Hosted Instances.