For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[Minor] Issue With Uploads, Downloads and API
Incident Report for Box
Postmortem

We recently addressed issues affecting the availability of Box Drive and the All Files page. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between the afternoon of April 27, 2021 and the evening of April 28, 2021 some users may have experienced difficulties while working in Box. During this time, Box Sync and Box Drive requests would have failed intermittently. Additionally, on April 27, 2021 between 2:26 PM PDT and 2:31 PM PD the Box All Files page would have sporadically failed to load for some users. 

Analysis 

The issue occurred as a result of a recent code change in our ongoing effort to improve performance and stability of our database and caching systems. Central to this incident was the persistent corruption of two hot Memcache key-value pairs as a result of a bug which evaded manual attempts to fix the corruption and kept persisting the corrupted cache value back into Memcache. These bad cache values were treated as cache misses, which resulted in increased load on the DB because of the increased need to fetch the uncached database record. The increased load pushed the DB into a degraded state, which was seen through the increased database read latency and high CPU utilization. 

The load on the DB was shed by temporarily hardwiring DB responses which helped mitigate the impact to the site. We were then able to resolve the issue permanently by pushing a bug fix to our production codebase. Once completed, we then reverted the temporary hardwiring of the DB responses.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Add robust logging for unexpected values returned by the data access layer

  • Add more metrics to alert internal teams quickly when seeing an unexpected increase in bad values returned from the data access layer

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted May 11, 2021 - 07:34 PDT

Resolved
No additional impact has been observed and we are considering this issue to be resolved. If you are seeing any new issues, please let us know at https://support.box.com.
Posted Apr 27, 2021 - 18:23 PDT
Monitoring
Our team has taken action to remediate impact and is seeing recovery across metrics. We are continuing to monitor for any additional impact.
Posted Apr 27, 2021 - 15:26 PDT
Update
We are currently investigating an issue where customers may experience errors when attempting to Upload or Download, or use the Box API, Box Drive, or Box Sync to manage content. We are continuing to investigate this issue.
Posted Apr 27, 2021 - 14:34 PDT
Investigating
We are currently investigating an issue where customers may experience errors when attempting to Upload, Download, or use the Box API. We will provide additional information as it becomes available.
Posted Apr 27, 2021 - 14:08 PDT
This incident affected: Desktop Applications (Box Sync, Box Drive), Box Web Application (Uploads/Downloads), and Box Platform / API (Uploads/Downloads).