Geocoding failures
Incident Report for Onfleet
Postmortem

Overview:

At 07:11 PDT on Sep 29, we began rate limiting geocoding requests to our upstream provider due to an excessive amount of requests from a customer testing their integration.  Due to a bug in our code, these requests were allowed to continue for several hours before they triggered rate limiting.  We deployed a fix at 09:05 PDT and processing returned to normal.

What Happened:

Around 00:40 PDT, a customer began testing their API integration.  Their testing was creating invalid tasks due to an incorrect billing status and a bug in the code that verifies billing authorization did not fully prevent this API request activity.  Each task creation attempt caused a geocoding request to our upstream provider.  After several hours of sustaining the same request pattern, the problematic activity caused our system to exceed the internal rate limits for our upstream geocoding provider at 07:11 PDT.  These rate limits had not been updated recently and unfortunately reflected levels considerably lower than those in our contract with this provider.

Under normal circumstances, automatic monitors in our systems would have detected this customer’s activity and allowed us to mitigate the issue before it became critical.  However, due to a bug in our monitoring, these requests were not tracked correctly and so the alerting never took place.

What we have implemented and will do in the future:

We have adjusted the external geocoding rate limits to match our current capabilities. We are now in the process of testing our fixes to the underlying bug for deployment in the coming days.  We are reviewing all our geocoding monitoring to make sure that we have alerts set on the appropriate data and conditions.  We do apologize for this geocoding interruption and will continue to enhance our monitoring and service configuration to reduce the likelihood of edge cases such as this one occurring in the future.

As always, if you require further detail, please do not hesitate to email us at support@onfleet.com.

Posted Sep 30, 2021 - 19:11 PDT

Resolved
We have now fully verified that the underlying issue is no longer in effect. We will follow up tomorrow with a detailed postmortem. Thank you for your patience while our team worked on bringing this situation to full resolution.
Posted Sep 29, 2021 - 10:58 PDT
Monitoring
We have deployed a fix and are seeing positive results.
Posted Sep 29, 2021 - 09:13 PDT
Identified
We have identified issues with our rate limiting mechanisms that sit in front of Google Maps Platforms APIs and are actively troubleshooting to determine the source of the issue.
Posted Sep 29, 2021 - 08:19 PDT
Investigating
We are currently investigating an issue with upstream geocoding failures
Posted Sep 29, 2021 - 07:45 PDT
This incident affected: Maps.