Elevated Error Rates for Drive Axle, Document Hub, and Platform Document Delivery
Incident Report for Eleos Technologies
Postmortem

There were two Eleos Platform outages on May 2 from 18:15 UTC to 19:46 UTC and from 20:25 UTC to 20:45 UTC, for a total of 1 hour and 51 minutes.  During these outages, users could not log into Drive Axle, Document Hub, or App Manager.

During these outages:

  • Workflow actions and messages that included telematics data submitted by drivers were delayed until after the outage.  Workflow actions that were submitted during these times fell back to an offline state if offline workflows were configured.  The apps then synchronized actions and messages after the outage.
  • Drivers with the manage_shipments flag enabled potentially failed to retrieve updated load data.
  • Drivers would have experienced delays when they attempted to upload scanned documents.  If drivers logged out while documents were still queued for upload, those documents were lost.
  • Drivers would have experienced delays when they attempted to retrieve their previously-scanned document list.
  • Users who were already logged into App Manager would have experienced difficulties with editing document types and editing forms that have document types.

Due to a simultaneous outage of a telematics partner, Platform features that relied on their provided services, such as telematics-enabled messages and workflows, would have fallen back to their offline functionality if configured.

Regarding users who could not log into Drive Axle, Document Hub, and App Manager, our system experienced these failures because certain authentication calls inadvertently depended on telematics integration logic.  Because of the simultaneous outage, these authentication calls timed out, causing resource exhaustion that cascaded to other, non-authentication requests.  These requests should be independent.  To make them independent, we are making changes that will decouple these requests.

We are deeply sorry for the interruptions, delays, and distraction this incident caused for you and your drivers.  Compounding that, we did not communicate the existence of a known incident promptly.  We are reviewing and adjusting our on-call procedures and training to correct this.

Posted May 10, 2024 - 18:36 UTC

Resolved
Error rates have returned to normal. We apologize for interruption of service.
Posted May 02, 2024 - 20:22 UTC
Monitoring
We're currently monitoring the system as the error rates have gone back to normal.
Posted May 02, 2024 - 20:02 UTC
Update
Error rates have reduced dramatically during this time period. We're still currently investigating the cause.
Posted May 02, 2024 - 19:54 UTC
Update
We are actively investigating these issues.

During this time period, logging into App Manager is also affected along with logging into the Document Hub. Drive Axle users are experiencing difficulties logging in.
Posted May 02, 2024 - 19:48 UTC
Update
Logging into App Manager is also affected during this time.
Posted May 02, 2024 - 19:26 UTC
Investigating
We are currently investigating elevated error rates for Drive Axle and the Document Hub.

Scanning and retrieval of sent documents are affected for Eleos Platform customers as well. Scanned documents will not be lost during this time and will be retried.
Posted May 02, 2024 - 19:16 UTC
This incident affected: Eleos Platform (App Manager, Mobile Apps, Document Delivery) and Document Hub and Drive Axle.