Beginning at 4:00 pm EST on October 6, 2021, Kustomer engineering was alerted to an issue where one of the database shards entered a bad state. At a high level, our third-party cloud database (MongoDB) was unable to process a high volume of write transactions on our messages during peak traffic hours. It lasted for approximately 3 hours.
During this time, inbound and outbound messages generated within Prod1 Kustomer organizations were held. These messages were held until the database issue was cleared up and the items started to get redriven. The Kustomer system experienced latency which included the inability to access conversations and customer timelines.
This issue stemmed from a bug on the side of our MongoDB regarding their server-side logic around transactions.
Kustomer engineering worked with out third-party could database to rectify this issue at approximately 5:33 PM EST. All items were redriven with residual events completed by 7:20 PM EST.