Elevated latency for History and Push in the US-East region
Incident Report for PubNub
Postmortem

Problem Description, Impact, and Resolution 

Beginning at 14:20 UTC 2020-10-28, the PubNub History and Push services had a sharp increase in latency for the US-east region because a node failed but was unable to restart swiftly, resulting in delayed service restoration. Manual operation tools deployed to fully restart the system by 15:05 UTC bringing the incident to full resolution.

Mitigation Steps and Recommended Future Preventative Measures 

To prevent a similar issue from occurring in the future we have introduced additional health checks to ensure unhealthy nodes are removed from service. In addition, we are making longer term changes to service discovery and load balancing on this service to ensure a failed node would not impact latencies.

Posted Mar 22, 2021 - 15:39 UTC

Resolved
Services have been stable for 60 minutes. This incident is now resolved.
Posted Oct 28, 2020 - 16:05 UTC
Monitoring
We have identified and mitigated the issue, and we are now monitoring health of the services.
Posted Oct 28, 2020 - 15:28 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 28, 2020 - 15:14 UTC
Investigating
Beginning at approximately 7:20am PDT, the History and Push services began experienced elevated latencies in the US-east region
Posted Oct 28, 2020 - 15:08 UTC
This incident affected: Points of Presence (North America Points of Presence) and Realtime Network (Storage and Playback Service, Mobile Push Gateway).