Rules Engine and Data Processing queues are blocked
Incident Report for Gainsight
Postmortem

Issue: Rules Engine and Data Processing queues blocked

Cause: A power failure at a service provider data center.

Incident Timeline on the 22nd of December:
13:34 UTC - We received notice that our upstream platform service experienced a power outage in a specific availability zone, which had an impact on Gainsight CS as well as other supporting services. While some customers witnessed very little impact, others had queues blocked to prevent data loss. The outage caused corruption that required data to be manually restored for some customers.

18:35 UTC - A subset of customer queues remained in a blocked state as we recovered reporting and dashboard data. Our engineers also added resources to help bring queues current. We maintained an open bridge with vendor support engineers who helped with recovery.

21:59 UTC - By this time, all report data was restored and errors subsided.

Resolution:
While the actual resolution had to do with power restoration at service provider level, we also discovered an opportunity for better data resiliency which we have taken into account to avoid larger scale issues like this.
Thanks for your patience and please email support@gainsight.com with any questions.

Posted Jan 18, 2022 - 15:53 UTC

Resolved
This incident has been resolved and we will relay RCA details as we receive them. Please expect queue delays over the next 2-3 hours as we continue to work through backlog.
Posted Dec 22, 2021 - 21:59 UTC
Monitoring
We are monitoring after restoring services. Please expect queue delays as we continue to work through backlog.
Posted Dec 22, 2021 - 21:24 UTC
Update
Thanks for your patience as we continue to work on recovery with our upstream provider.
Posted Dec 22, 2021 - 20:38 UTC
Update
Thanks for your patience as we continue to work on recovery with our upstream provider.
Posted Dec 22, 2021 - 19:36 UTC
Update
Thanks for your patience as we continue to recover services.
Global queues have been unblocked but we are still recovering an isolated set of customers which remained blocked.
We are adding resources to help bring queues current. Please expect delays as we work through backlog.
Posted Dec 22, 2021 - 18:35 UTC
Update
Thanks for your patience as we continue to recover services.
Posted Dec 22, 2021 - 18:02 UTC
Update
Thanks for your patience as we continue to recover services.
Posted Dec 22, 2021 - 17:02 UTC
Identified
The login failures issue is now resolved.

We continue to work on other recovery actions.
Posted Dec 22, 2021 - 15:50 UTC
Investigating
Receiving reports of Login failures, we are investigating.
We will update as we have more information.
Posted Dec 22, 2021 - 15:20 UTC
Update
We are still working on recovery action.

While the queues are still in a blocked state, we observe the loading of High Volume Reports is also impacted.
Posted Dec 22, 2021 - 15:06 UTC
Update
We are working on recovery actions. We will update as we have more information.
Posted Dec 22, 2021 - 13:55 UTC
Identified
We have identified service disruption affecting our upstream service provider.
Rules Engine and Data Processing queues are currently blocked to prevent any impact on customer jobs.
Posted Dec 22, 2021 - 13:34 UTC
This incident affected: Gainsight CS - US1 Region (US1 Rules Engine Queue, US1 Data Processor Queue).