We also maintain a list of Known Product Issues separate from this site here.
We recently addressed issues affecting the Metadata Query API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.
Between 9am and 2pm PT on May 5th, 2023, some users may have experienced difficulties while working in Box. During this time, some requests to the Metadata Query API returned stale or outdated results. The issue occurred due to a delay in the replication of instance create and update events to the query datastore. We were able to resolve the issue by filtering out some events that were causing retries, dead lettering and hence delaying replication. In addition, we have identified the underlying pattern of events that was causing delays in the replication pipeline and patched our replication processing code to handle them gracefully, to prevent similar issues from occurring in the future.
Analysis
When Metadata instances are mutated using the Box API or one of our applications, events containing the details of these mutations are emitted to a replication stream that is then processed by a pipeline without delay. This ensures the latest state is consistently available to be queried against using the Metadata Query API. We employ multiple optimizations to maintain the near-realtime characteristics of this query functionality, while ensuring all data is replicated consistently.
During this incident, one of the optimizations we leverage encountered an unexpected pattern of data, causing it return errors and trigger retries. These retries slowed down processing of the replication stream, causing lag to grow and resulting in queries returning outdated information for some customers. After identifying and filtering out these events causing errors, we were able to get the replication pipeline to once again successfully process events without adding delay. We also increased the rate at which these replication events are processed, in order to recover from the lag that had built up and fully catch up.
Corrective Actions
The following corrective actions have been completed or are planned:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.
Sincerely,
The Box Team