Functions are failing to execute in the US-West PoP
Incident Report for PubNub
Postmortem

Problem Description, Impact, and Resolution 

Starting at around 21:25 UTC on 2021-01-26 published messages made in our US-West PoP were failing to trigger Functions and affected the ability to change Vault key values via Portal, as well. The incident was triggered when a database used to register Functions reached a size that unexpectedly degraded performance, causing cascading effects in the systems used to trigger Functions.

We routed publishes from the affected PoP to US-East to mitigate the impact so that all Functions were being triggered at 21:49 though Vault was still not functioning properly. At 22:28 Vault began to work again so all services were restored, though running out of the US-East PoP. After restarting processes in US-West, all services were restored and the issue was fully resolved at 22:48 UTC.

Mitigation Steps and Recommended Future Preventative Measures 

To prevent a similar issue from occurring in the future we are proactively managing the size of the databases that could be impacted by the size threshold that was uncovered by the incident. We have also added items to our backlog to alter the dependencies on the existing data storage approach for registering Functions.

Posted Feb 22, 2021 - 17:01 UTC

Resolved
This incident has been resolved.
Posted Jan 27, 2021 - 00:12 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 26, 2021 - 22:44 UTC
Update
We are still working on fixing the underlying issue, but the service has been restored since 21:49 UTC. Function executions remain operational, and customers are able to update Vault keys via Portal.
Posted Jan 26, 2021 - 21:43 UTC
Update
We are still working on fixing the underlying issue but service has been restored since 21:49 UTC. Function executions remain operational and customers are able to update Vault keys via Portal.
Posted Jan 26, 2021 - 21:33 UTC
Identified
Starting at around 21:25 UTC (13:25 PST) many customer's publishes made in our US-West PoP were failing longer triggering Functions. This affected all Function types. It also affected the ability to change Vault key values via Portal.

We have routed calls around the issue and, as of 21:49, Functions were executing as normal. Customers may still see problems changing Vault keys.
Posted Jan 26, 2021 - 21:07 UTC
Investigating
At around 21:25 UTC (13:25 PST) Functions started to fail to execute in the US-West PoP
Posted Jan 26, 2021 - 21:00 UTC
This incident affected: Functions (Functions Service) and Points of Presence (North America Points of Presence).