For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Minor] Issues With Box Drive
Incident Report for Box
Postmortem

We recently addressed issues affecting the availability of Box Drive and the All Files page. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between the afternoon of April 27, 2021 and the evening of April 28, 2021 some users may have experienced difficulties while working in Box. During this time, Box Sync and Box Drive requests would have failed intermittently. Additionally, on April 27, 2021 between 2:26 PM PDT and 2:31 PM PD the Box All Files page would have sporadically failed to load for some users. 

Analysis 

The issue occurred as a result of a recent code change in our ongoing effort to improve performance and stability of our database and caching systems. Central to this incident was the persistent corruption of two hot Memcache key-value pairs as a result of a bug which evaded manual attempts to fix the corruption and kept persisting the corrupted cache value back into Memcache. These bad cache values were treated as cache misses, which resulted in increased load on the DB because of the increased need to fetch the uncached database record. The increased load pushed the DB into a degraded state, which was seen through the increased database read latency and high CPU utilization. 

The load on the DB was shed by temporarily hardwiring DB responses which helped mitigate the impact to the site. We were then able to resolve the issue permanently by pushing a bug fix to our production codebase. Once completed, we then reverted the temporary hardwiring of the DB responses.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Add robust logging for unexpected values returned by the data access layer

  • Add more metrics to alert internal teams quickly when seeing an unexpected increase in bad values returned from the data access layer

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted May 20, 2021 - 09:46 PDT

Resolved
No additional impact has been observed and we are considering this issue to be resolved. If you are seeing any new issues, please let us know at https://support.box.com.
Posted Apr 28, 2021 - 19:22 PDT
Monitoring
Our team has identified the root cause of this issue and has taken corrective action. Impact should be fully remediated at this time. We are continuing to take steps to ensure the behavior does not reoccur and will continue our monitoring window while that work is completed. Additional updates will be provided as they become available.
Posted Apr 28, 2021 - 14:20 PDT
Update
We are continuing to investigate this issue and taking preventative action to mitigate impact. The issue continues to be intermittent and users will likely see variable improvements in performance. We will provide additional updates as they become available.
Posted Apr 28, 2021 - 11:47 PDT
Investigating
We are investigating an issue that is intermittently impacting Box Drive. This may cause some customers to experience intermittent failures when logging in to Box Drive or when using the service to access or manage content.

This behavior is isolated to Box Drive. Other services are still functioning as expected and can be utilized as an interim workaround. We will provide additional updates as they become available.
Posted Apr 28, 2021 - 09:36 PDT
This incident affected: Desktop Applications (Box Drive).