On 5-October-2021 at 8:22 UTC, Datto RMM Partners on the Pinotage platform experienced a service interruption which caused intermittent 503 errors when using the Agent Browser, and an increased error rate in the agent logs.
The issue was mitigated, but resurfaced during periods of high load. These periods of high load are not unexpected, and are transparent to users during normal operation.
Investigation revealed several issues, the combination of which compounded by periods of high load caused an exhaustion of pool resources and connections.
During this incident, while issues were being identified, and addressed by the Engineering team, we put in place temporary workarounds across the Load Balancer infrastructure to alleviate the issues and prioritise Agent Browser traffic.
Hotfixes have been created, tested and released for the identified issues on the affected platforms; other platforms have received these changes in the 10.0 release.
The issue was considered resolved once the 10.0 release was live on all platforms by 19-October-2021.