For more information about our Incident Response and Communications please read this support article.

We also maintain a list of Known Product Issues separate from this site here.

[CRITICAL] Issue with Box Services
Incident Report for Box
Postmortem

We recently addressed issues affecting Box activity, including Box Metadata, file uploads and downloads, Box Notes, and the public API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

On July 7th, 2022 between 6:02 PM PDT and 7:10 PM PDT, some users may have experienced difficulties while working in Box. During this time, users may have experienced higher than normal latency, timeouts and/or errors when trying to access Box via one of our applications or our public API.

Analysis

This issue occurred due to instability in one of our HBase clusters. To improve the capacity of the Box Metadata service, our engineering teams deployed an expanded data storage cluster on the afternoon of July 7th. This cluster was tested prior to its deployment, and performed as expected in production immediately after deployment. Approximately 3 hours after the deployment of this cluster, a single server within the cluster exhibited dramatically increased latency caused by a defective memory module. This resulted in users experiencing degradation of uploads, downloads, Box Notes, and other related services. Our automated server failure detection methods did not detect this server as failed, and as a result, traffic continued to be directed to this server. Requests to this server developed a backlog, which affected latency of Box Metadata generally, which resulted in timeout errors and slow performance in other Box services. We were able to resolve the issue by diverting traffic to a healthy passive cluster. In addition, we are working on process and tooling improvements to prevent similar issues from occurring in the future.

Corrective Actions

The following corrective actions have been completed or are planned:

  • The defective server was removed from service and repairs effected.
  • Revised datastore cluster deployment, onboarding and validation processes to prevent this issue from recurring in the future and to improve detection of problematic deployments.
  • Improving resilience of Box services to high latencies when accessing datastores to minimize the impact of any similar issues in the future.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,

The Box Team

Posted Jul 12, 2022 - 10:32 PDT

Resolved
No additional impact has been observed during monitoring and all services are confirmed fully functional. We are considering the issue to be fully resolved. If you are encountering any issues, please contact Box Support at https://support.box.com.
Posted Jul 07, 2022 - 20:45 PDT
Update
We are continuing to monitor for any further issues.
Posted Jul 07, 2022 - 20:07 PDT
Monitoring
Our teams have validated that all affected services are recovered at this time. We are continuing to monitor for any additional impact.
Posted Jul 07, 2022 - 19:35 PDT
Update
We are continuing to investigate this issue.
Posted Jul 07, 2022 - 19:00 PDT
Investigating
We are currently investigating an issue with Box services. We will provide more information as it becomes available.
Posted Jul 07, 2022 - 19:00 PDT
This incident affected: Box Web Application (Login/SSO, Uploads/Downloads, Preview, Workflows and Automations), Desktop Applications (Login/SSO, Box Drive), Box Notes (Web Application), Box Platform / API (Content Preview, Search, Uploads/Downloads), Mobile Applications (Preview, Search), and Box Website.