Third party disruption: PubNub
Incident Report for Kustomer
Postmortem

Summary

Beginning at 19:52 EDT on July 7th, 2021,  History, Push Device Registration, and Channel Group change experience elevated latencies and errors. At 20:13 EDT Kustomer engineering was alerted that PubNub, a third-party, real-time service Kustomer relies on to power chat 2.0 and team pulse, was reporting a degraded performance with their system. This prevented these components from working properly until the issue was resolved.

At about 10:31 UTC (06:31 EDT), the Storage began to experience elevated latencies and errors in the Asia Northeast PoP. https://pubnub.statuspage.io/

Impact/Alerts

Services impacted:

  • Chat 2.0
  • Team Pulse
  • General Latency

Root Cause

Details on the root cause can be found here: https://status.pubnub.com/incidents/5377g4wvyxf3

At 10:31 UTC on 07 July 2021, we observed elevated latencies and errors in our Tokyo point of presence which affected History. This issue occurred because of an atypical combination of usage patterns causing CPU over-utilization in that region, ultimately resulting in the latencies and errors we observed. The incident was fully resolved at 11:07 UTC.

Resolution

The incident was fully resolved at 11:07 UTC, No further information on the resolution from PubNub was given.

Lessons/Improvements

PubNub added additional checks to prevent this from reoccurring in the short term. Still, in the coming weeks, they will be analyzing the causal usage patterns and working to separate the affected services with the goal of mitigating the possible recurrence of these multiple process failures.

Posted Jul 12, 2021 - 19:26 EDT

Resolved
The PubNub issues impacting team pulse and chat functions have been resolved. For more detailed information about the incident from PubNub, please refer to https://pubnub.statuspage.io/

If you need additional assistance, please reach out to our Support team by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.
Posted Jul 07, 2021 - 21:41 EDT
Monitoring
At this time we are seeing that the errors associated with PubNub have dropped. While there still may be some degraded performance, the team pulse and chat functions are returning to normal operation. Kustomer is continuing to monitor the situation to ensure that all components are functioning correctly. We will share another update here when the incident is fully resolved.
Posted Jul 07, 2021 - 20:39 EDT
Update
PubNub, a third-party, real-time service Kustomer relies on to power things like chat and team pulse, is currently reporting a degraded performance with their system. We have been made aware and are actively monitoring the situation.

For more information, please refer to https://pubnub.statuspage.io/
"At about 23:52 UTC (16:52 PST), History, Push Device Registration, and Channel Group change experience elevated latencies and errors. This is affecting all of our PoPs except for US West PubNub Technical Staff is investigating and more information will be posted as it becomes available."

Please reach out to our Support team with any additional questions. You can reach us by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.
Posted Jul 07, 2021 - 20:26 EDT
Investigating
PubNub, a third-party, real-time service Kustomer relies on to power things like chat and team pulse, is currently reporting a degraded performance with their system. We have been made aware and are actively monitoring the situation.

For more information, please refer to https://pubnub.statuspage.io/

Please reach out to our Support team with any additional questions. You can reach us by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.
Posted Jul 07, 2021 - 20:25 EDT
This incident affected: Prod1 (US) (Channel - Chat) and Prod2 (EU) (Channel - Chat).