Slow build finishing
Incident Report for Percy
Resolved
This incident has been resolved as of 11:33 am PDT.

Details: after deep debugging alongside our managed Redis provider, we were able to narrow this issue to a configuration change they made in a recent Redis database upgrade. This upgrade changed the behavior of SSL error handling and caused a rare corruption of SSL connections, which affected our sidekiq connections. We put a temporary patch in place for the last 12 hours to hide the user-visible effects of these job failures while we continued to root cause the issue. As of 11:33 am PDT, our redis provider was able to identify the root cause and update configurations to fix the issue. We believe that the issue is now fully resolved, and will continue monitoring.

We apologize for any inconvenience this may have caused. We will wait for the results of our redis provider's root cause analysis as well as conduct our own internal post-mortem to identify ways we can continue to improve resiliency of our job processing infrastructure.
Posted Oct 01, 2020 - 13:13 PDT
Monitoring
A temporary fix is in place and we will continue to monitor and root cause the issue. Leaving this incident open until we are sure the issue is fully resolved and no customers are impacted. Thanks for your patience.
Posted Oct 01, 2020 - 02:41 PDT
Identified
We have narrowed the issue to a sidekiq job orphaning issue, but we have not been able to root cause the exact cause yet. We have put a temporary fix in place that manually retries orphaned jobs and are continuing to investigate.
Posted Oct 01, 2020 - 02:07 PDT
Investigating
We are currently investigating an issue affecting multiple customers where builds appear to get stuck processing at the last few screenshots and not finishing.
Posted Oct 01, 2020 - 00:16 PDT
This incident affected: Rendering infrastructure.