RMM - Pinotage - 503 errors displayed in the agent browser - Monitoring
Incident Report for Datto
Postmortem

On 5-October-2021 at 8:22 UTC, Datto RMM Partners on the Pinotage platform experienced a service interruption which caused intermittent 503 errors when using the Agent Browser, and an increased error rate in the agent logs.

The issue was mitigated, but resurfaced during periods of high load. These periods of high load are not unexpected, and are transparent to users during normal operation.

Investigation revealed several issues, the combination of which compounded by periods of high load caused an exhaustion of pool resources and connections.

During this incident, while issues were being identified, and addressed by the Engineering team, we put in place temporary workarounds across the Load Balancer infrastructure to alleviate the issues and prioritise Agent Browser traffic.

Hotfixes have been created, tested and released for the identified issues on the affected platforms; other platforms have received these changes in the 10.0 release.

The issue was considered resolved once the 10.0 release was live on all platforms by 19-October-2021.

Posted Oct 26, 2021 - 10:40 UTC

Resolved
This incident has been resolved.
Posted Oct 19, 2021 - 12:34 UTC
Update
Our Engineering team has taken preventative measures and the service is fully operational now.

We will continue to actively monitor the health of the service until the end of the RMM 10.0 release cycle.
Posted Oct 12, 2021 - 09:21 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 12, 2021 - 08:12 UTC
Identified
Our teams are currently investigating RTO agent browser 503 errors for Datto RMM on Pinotage. An update will be posted here as we progress with the status of this investigation.

Thank you for your patience!
Posted Oct 12, 2021 - 07:50 UTC
This incident affected: Datto RMM (Pinotage (EU1)).