Public Share and Zip Downloads on Mounted Folders
Incident Report for Files.com
Postmortem

We were notified by two customers in the early morning hours on May 17th, 2022, that their public shares were not working from their mount. Further testing identified the effects of the problem to impact public shares and zip downloads from mounts.

In troubleshooting this issue, additional detailed logging was added and was pushed out via our automated testing/deployment system. At the time we were unaware of a bug in the deployment process for the public share system that would force the relevant servers to restart a process, which would then stop FTP connections from functioning correctly. This follow-up issue occurred on May 17th, 2022, at 3:39 PM PST. We immediately started our Incident Management process and had FTP back online and functioning on May 17th, 2022, at 4:11 PM PST.

During the troubleshooting of the original public share/zip downloads from certain mounts, on May 18th, 2022, at 1:22 PM PST, we identified a performance bottleneck with our systems that mange folder/file mounts and syncs. We implemented thread prioritization and resolved those bottlenecks on May 18th, 2022, at 2:50 PM PST.

Also on May 18th, we started receiving additional reports from customers outside of the United States that their public shares were not functioning. These public shares were not originated from mounts. We identified a service that was not running on the systems that provided the public share processing and corrected that on May 18th, 2022, at 9:58 PM PST. As part of this service fix, we identified and implemented a new set of monitoring.

On May 19th, 2022 at 12:54 AM PST, we received two reports from customers that they were having response issues from our API and FTP connections. We researched that issue and found that a service had been added as part of troubleshooting the original public share/zip downloads on mounts that caused large memory spikes which would slow the responses for API calls and FTP connections. The service in question was turned off and removed, which restored API and FTP functionality on May 19th, 2022, at 2:54 AM PST.

Troubleshooting continued on the original public share/zip downloads from mounts, and the automated deployment processing bug which would cause FTP on the servers to crash was identified and fixed on May 19th, 2022, at 11:25 AM PST. Correcting this bug stopped the servers from crashing FTP and allowed the troubleshooting to focus solely on the root cause of the public share/zip download issue.

On May 19th, 2022, at 2:01 PM PST we were notified by a customer in Europe that their public share was only working in the United States and not in Germany. Troubleshooting this issue identified the public share was not working anywhere but the United States. We investigated, and that issue was corrected on May 19th, 2022, at 3:14 PM PST. We identified and deployed a monitoring enhancement that would provide faster and better feedback that same day. We identified this public sharing issue as something new, possibly related to the original issue.

Troubleshooting continued on the original public share/zip downloads on mounts issue and the root cause was found. Once the root cause was identified (a bug in a content-length-header change on May 16th), it was then determined that this issue affected multiple mounts, not just the one specific mount identified in the customer support request. We escalated the priority level on the deployment of this fix and pushed out the permanent fix on May 19th, 2022, at 7:04 PM PST.

As part of this fix to deployment testing, a bug in the routing on all data centers outside of the United States was identified, which was the root cause of the second public share issue not related to mounts. That bug fix was deployed on May 19th, 2022, at 7:33 PM PST. Extensive internal testing was done on public share functionality, including customer validation that their issues were resolved, before declaring this incident resolved.

During our internal Postmortem meeting we identified multiple areas for process, testing, monitoring and platform improvements. The monitoring improvements have been deployed, the platform improvements have been identified and added to our development process, and the testing and process improvements are being added to our Incident Management Program policies and procedures.

We greatly appreciate your patience and understanding as we resolved these multiple issues. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Posted Jun 07, 2022 - 17:06 PDT

Resolved
Issues affecting access to public shares not on mounted folders from IPs outside North America have been resolved. As of 1:20 a.m. EDT, a fix has been implemented for the public share access, and we will continue to monitor the results.

If you need additional support, please do not hesitate to contact our Customer Success team by email or phone. Thanks for your support while we resolved this issue.
Posted May 18, 2022 - 22:24 PDT
Update
Issues affecting access to public shares not on mounted folders from IPs outside North America have been resolved. As of 1:20 a.m. EDT, a fix has been implemented for the public share access, and we will continue to monitor the results.

If you need additional support, please do not hesitate to contact our Customer Success team by email or phone. Thanks for your support while we resolved this issue.
Posted May 18, 2022 - 22:23 PDT
Update
We are continuing to investigate some issues with accessing public shares on mounted folders. There are also some issues accessing public shares not on mounted folders when accessed from IPs outside North America. Affected users may not be able to access public shares, and downloads of multiple files at once may fail. We are actively investigating this issue and will provide additional updates as they become available. Customers with urgent questions are encouraged to contact our Customer Success team by email. Thank you for your patience.
Posted May 18, 2022 - 20:52 PDT
Update
We are continuing to investigate this issue.
Posted May 18, 2022 - 20:06 PDT
Investigating
We are currently experiencing some issues with public shares and zip downloads on mounted folders. Customers may not be able to access these shares. Downloads of multiple files that are automatically zipped may fail. We are actively investigating this issue and will provide additional updates as they become available. Customers with urgent questions are encouraged to contact our Customer Success team by email. Thank you for your patience.
Posted May 18, 2022 - 19:23 PDT
This incident affected: Web Interface and Remote Server Integrations (Sync and Mount).