The unplanned node restarts have stopped and everything has been stable for 2 hours, as such we are marking this incident resolved and will continue working with our hosting provider to understand the root cause.
Posted Aug 11, 2020 - 20:45 UTC
Update
The underlying situation is still ongoing, however the clusters were reconfigured and additional resources added to add more stability.
We are still working with our hosting provider to better understand the node reboots, at this point no reboots have been observed for an hour and clusters are stable.
Posted Aug 11, 2020 - 13:02 UTC
Update
The underlying cause has been traced to unexpected node reboots in the cluster. The source of this problem is under investigation though we are still seeing intermittent node reboots which does still bubble up to higher latency on some requests
Posted Aug 11, 2020 - 11:46 UTC
Monitoring
The source of the latency has been traced to a network interruption within the us-central1 region, the interruption has cleared and all operations have returned to normal, we will continue to monitor.