Intermittent build timeouts
Incident Report for Percy
Resolved
This incident has been resolved.

We made additional significant adjustments to our rendering infrastructure's resourcing and memory management systems to mitigate build timeouts, and have seen a subsequent drop in error rates and timeouts. Though the vast majority of screenshots were rendering without issue, during certain periods of high load and with very large "poison pages", significant memory pressure would occur and cause a cascading issue across our rendering clusters.

We will complete an internal root cause analysis and post-mortem to improve our systems and handling of this type of issue in the future, and we will continue monitoring for any similar issues. Thank you for your patience and support during this incident.
Posted May 21, 2021 - 12:48 PDT
Update
We have made significant adjustments to our rendering infrastructure's resourcing and memory management systems to mitigate build timeouts. Though the vast majority of screenshots are rendering without issue, certain very large "poison pages" may require significant resources to process and may result in timeouts.

We are monitoring this closely and will update as we learn more, until this is fully resolved. Thank for your continued patience.
Posted May 20, 2021 - 14:08 PDT
Update
We are continuing to investigate this issue.
Posted May 20, 2021 - 06:03 PDT
Update
We are continuing to investigate this issue.
Posted May 20, 2021 - 03:15 PDT
Monitoring
We have identified what we believe to be the root cause of memory pressure issues in our rendering infrastructure. Certain very large "poison pages" caused a new type of memory pressure in Firefox which was not correctly handled and resulted in timeouts and failed builds. We have updated our systems to better handle this edge case and are now verifying the fix.

We will leave this issue open for the next 24-48 hours as we verify the fix works under normal load.
Posted May 19, 2021 - 10:58 PDT
Investigating
We are currently investigating an incident with intermittent build timeouts which began last night around 8:30 pm PT affecting multiple customers, which has since stabilized. The problem is not currently happening, but we are proactively declaring an incident because we believe it may reoccur again under certain load conditions.

We are working to identify and remediate this and will update this incident as we root cause the issue. Thank you for your patience.
Posted May 19, 2021 - 06:42 PDT
This incident affected: Rendering infrastructure.