Gainsight application users experienced Delays in Journey Orchestrator from Sep 08, 2022 - 22:16 UTC to Sep 09, 2022 - 10:24 UTC. During this window, Journey Orchestrator was either delayed or unavailable for end users.
Root Cause :
Investigations so far have indicated the incident was the result of following multiple factors
- A production change was performed on 7th September to fix a bug where email participants from the program which contains inline reports in the email step were getting dropped
- This fix along with an edge-case scenario where a sudden burst of 800K participant email published by a customer consumed a huge amount of storage within a very short span of time on one of the backend Datastores.
- While the autoscaling mechanism increased the storage capacity immediately that was also consumed and further storage addition was blocked due to the Optimization step for previously added storage.
Recovery Action :
- Functionality was blocked to prevent any impact or data loss.
- Initiated recovery steps to restore the Datastore without data loss.
- Functionality was restored as soon as Datastore was recovered.
Preventive Measures:
- Review and fix the edge-case scenario. Review of change management procedures.
- Review storage Autoscaling limits to prevent in the future.