Issue: Intermittent outages in the US region for community and control
At seemingly random times the platform would slow down and the load times of pages would suffer a degraded performance or timeout completely.
We experienced a targeted massively distributed high-traffic load that, even though was eventually prevented by the platform firewall, was still high enough to take the US region down for around 15 minutes.
Due to an overwhelming volume of requests, our system experienced capacity overload. Within a span of minutes, we recorded hundreds of thousands of GET requests originating from thousands of unique IP addresses. These requests were primarily directed towards the root URL ("/") of a specific community.
Once capacity was overwhelmed, this then resulted in end users seeing load timeouts or error messages on screen.
The issue was mitigated quickly on our side which helped to limit the affected time as well as damage to the platform. Our security and engineering teams successfully mitigated the attack, restoring normal service operations within 15 minutes when all systems were back to functioning as expected.
We can provide assurance that 100% of our customers' data remains secure. The impact of the incident was confined to occasional instances of sluggish performance and intermittent timeouts during page loading. Rest assured, there has been no compromise to the integrity or security of any data.
We are actively implementing a comprehensive set of measures aimed at minimizing the likelihood of similar incidents occurring in the future and enhancing our ability to swiftly mitigate any disruptions should they arise. These efforts encompass both technical enhancements and organizational improvements to fortify our overall response capabilities and minimize the impact on our customers.
25th April 17:09 - First automated alarm triggered indicating a breach in allowed load time threshold for communities.
17:11 - Incident response team was mobilized internally
17:14 - Incident was escalated due to severity
17:16 - The malicious traffic was identified and isolated from the normal traffic. A firewall adjustment made to counter it.
17:24 - The traffic on the platform returned to normal parameters and the disruption period was over
We greatly appreciate your patience and understanding throughout this incident. If you have any further questions or concerns, please don't hesitate to contact our support team at ccsupport@gainsight.com