The incident started at about 21:29 UTC (13:29 PST). Due to extremely high CPU, messages were not being written to Storage for any publishes that occurred in our Mumbai PoP, however Storage reads were successful for any data persisted prior to the incident (with some latency). No data was lost, rather, it queued up until the writers were able to successfully catch up.
The resolution came when we restarted the Storage processes. The incident concluded at about 21:58 UTC (13:58 PST).
We have updated code to prevent the errors caused by deleted records in the distributed data storage.