High throughput after deployment
Incident Report for Onfleet
Postmortem

Overview:

At around 5:41PM PDT, a change to the WebSocket server code was released which caused connections to abruptly close. The release was immediately rolled back. Operations returned to normal for most customers no later than 6:02PM PDT.

What Happened:

A change to WebSocket processing caused WebSocket connections to abruptly close. This triggered a thundering herd like effect where clients attempt to reopen a new web socket connection and continue to place additional request load on the application servers driving up throughput. This additional pressure degraded performance for all clients.

What we have implemented and will do in the future:

The root cause has been addressed and we are also exploring back-off mechanisms to prevent the thundering herd like effect.

As always, if you require further detail, please do not hesitate to email us at support@onfleet.com.

Posted Oct 13, 2021 - 11:11 PDT

Resolved
This incident has been resolved.
Posted Oct 07, 2021 - 19:13 PDT
Monitoring
The rollback is now in full effect and we are closely monitoring all metrics to confirm that they are returning to their standard ranges.
Posted Oct 07, 2021 - 18:08 PDT
Identified
We are rolling back the changes and are starting to see some improvements as the old build finds its way into our cloud resources.
Posted Oct 07, 2021 - 18:00 PDT
Investigating
We are currently investigating this issue.
Posted Oct 07, 2021 - 17:55 PDT
This incident affected: Dashboard and API.