From 2022-11-18 22:12 UTC until 2022-11-18 22:42 UTC (30 minutes total), the Eleos Mobile Platform experienced a partial outage caused by a sudden increase in latency and error rates when performing writes to one of our primary data stores. During the incident period, some drivers experienced intermittent failures when using the Eleos mobile app, similar to the behavior seen when the app is offline. Similarly, some Platform Dashboard users experienced slowness and failures when attempting to view or change app settings and content. API clients, such as integrations attempting to send outbound messages to drivers, would have experienced higher-than-normal error rates. Because of the nature of the underlying issue and our high-availability architecture, not all users would have experienced or noticed errors during the 30 minute period.
Although our monitoring immediately detected the issue and the on-call engineer responded quickly, it took 27 minutes before the first customer-facing update to the status page occurred. This delay undermines the value of the status page, and we’re revising our incident handling procedures accordingly to better emphasize earlier communication.
This initial incident resulted in an additional data consistency issue affecting a small number of users, which persisted over the weekend until a server fix was deployed at 2022-11-21 17:44 UTC.
Drivers affected by this additional data consistency issue were unable to receive updated app data after they modified (e.g., viewed or deleted) a subset of messages that were sent during the incident on the 18th. The server fix resolved this error without additional driver or customer action.
A more detailed narrative and root cause analysis is available from your account executive upon request.