Southern Asia PoP may have experienced delays with messages being written to Storage
Incident Report for PubNub
Postmortem

Problem Description, Impact, and Resolution 

The incident started at about 21:29 UTC (13:29 PST). Due to extremely high CPU, messages were not being written to Storage for any publishes that occurred in our Mumbai PoP, however Storage reads were successful for any data persisted prior to the incident (with some latency). No data was lost, rather, it queued up until the writers were able to successfully catch up. 

The resolution came when we restarted the Storage processes. The incident concluded at about 21:58 UTC (13:58 PST).

Mitigation Steps and Recommended Future Preventative Measures 

We have updated code to prevent the errors caused by deleted records in the distributed data storage.

Posted Feb 11, 2021 - 23:18 UTC

Resolved
This incident has been resolved.
Posted Jan 05, 2021 - 23:05 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 05, 2021 - 22:10 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jan 05, 2021 - 22:10 UTC
Update
We are continuing to investigate this issue.
Posted Jan 05, 2021 - 22:09 UTC
Investigating
Around 21:29 UTC (13:29 PST), customers in our Southern Asia PoP may have experienced delays between the time messages were published and the time they were written to storage. There were also some delays in read requests for data that was already persisted. All messages were eventually stored by 21:56 UTC (13:56 PST) and all read latencies were recovered by 21:58 UTC (13:58 PST).
Posted Jan 05, 2021 - 22:09 UTC
This incident affected: Realtime Network (Storage and Playback Service) and Points of Presence (Southern Asia Points of Presence).