app.lightdash.cloud showing degraded performance
Incident Report for Lightdash
Postmortem

Earlier today we received an automated alert that app.lightdash.cloud was unavailable and returning 502 errors. The reason for this error was that Lightdash was slowing down due to the amount of usage in Lightdash Cloud at the time. The slower response times in Lightdash triggered an automated process to restart the Lightdash servers, usually this should only trigger in the case that the server has already crashed. In this incident, this was a mistake and the server was simply running more slowly than expected.

To resolve the issue, we've added much more resource to our Lightdash Cloud servers to prevent slow response times. We've also increased the threshold to automatically restarting the servers in the case of very slow response times.

Posted Jan 26, 2023 - 20:27 UTC

Resolved
This incident is resolved.
Posted Jan 26, 2023 - 20:26 UTC
Monitoring
A fix has been implemented and we are monitoring the deployment.
Posted Jan 26, 2023 - 19:58 UTC
Identified
We've identified the root cause and it's being fixed.
Posted Jan 26, 2023 - 19:49 UTC
Update
We have identified the failing component but are still finding the root cause. Services appear operational again.
Posted Jan 26, 2023 - 19:03 UTC
Investigating
We are currently investigating the root cause of the issue
Posted Jan 26, 2023 - 16:49 UTC
This incident affected: Lightdash Cloud (Lightdash Cloud (US)).