US-East PoP may have experienced delays with messages being written to Storage
Incident Report for PubNub
Postmortem

Problem Description, Impact, and Resolution 

On Jan 11, 2021, at 20:45 UTC messages stopped being written to Storage for publishes in our US-East PoP resulting in those messages not being available in history during the incident. The delay was caused by slow database writes caused by a problem with the way deleted records are handled in the distributed data store in some scenarios.  We were able to restart the process and the issue was resolved at 22:23 UTC. During the incident, Storage reads were successful for any data that persisted prior to the incident, and no data was lost, instead all messages published during the incident were queued until the writers were able to successfully catch up. 

Mitigation Steps and Recommended Future Preventative Measures 

We have updated the code to prevent the errors caused by deleted records in the distributed data storage.

Posted Jan 26, 2021 - 18:59 UTC

Resolved
This incident has been resolved.
Posted Jan 11, 2021 - 23:43 UTC
Monitoring
A fix has been implemented at 10:23 PM UTC and we are monitoring the results for the next 1 hour.
Posted Jan 11, 2021 - 22:30 UTC
Update
Latencies have recovered in US West, also catching up with the messages published that were written to storage.
Posted Jan 11, 2021 - 22:16 UTC
Update
We are continuing to investigate this issue and seeing elevated latencies in the US West and EU Central.
Posted Jan 11, 2021 - 22:03 UTC
Identified
The issue has been identified and we are working on a fix.
Posted Jan 11, 2021 - 21:29 UTC
Investigating
Starting around 20:44UTC (12:44 PST), customers in our US-East PoP may have experience delays between the time messages were published and the time they were written to storage.
Posted Jan 11, 2021 - 21:25 UTC
This incident affected: Points of Presence (North America Points of Presence) and Realtime Network (Storage and Playback Service).