Welcome to the Umbraco status page. On this page, you can see the current operational status as well as plans for scheduled maintenance and automatic upgrades for all our cloud offerings: Umbraco Cloud and Heartcore.
Subscribe to updates above to get the latest status send straight to your inbox.
If you’re experience issues with your cloud project which does not seem to relate to the current operational status, please go to Our Umbraco and search for the issue or reach out to the Umbraco Support in the portal chat.
On Tuesday, June 28 2022 a scheduled upgrade for Umbraco-ID, the authentication service on Umbraco Cloud, was rolled out. This, unfortunately, resulted in sporadic instances where customers were unable to access the backoffice on Umbraco Cloud.
The affected component was a central internal API that orchestrates procedures between Umbraco ID and Azure B2C. In the two-week period between the beginning of the incident and the resolving the incident, we counted 12 outages ranging from 4 minutes to a maximum of 20 minutes.
We apologize for the inconvenience that this has caused our customers, and assure you that we have taken steps to address this specific issue and are working on initiatives to ensure that this type of issue will not happen again.
In preparation for the upcoming regional hosting options for Umbraco Cloud, an update was rolled out to Umbraco ID including updates to the underlying infrastructure and a series of DNS changes.
After the release, a series of post-deployment tests were completed successfully.
The Umbraco Cloud support team reported an increase in tickets raised, specifically concerning backoffice authentication (login).
At the same time, we saw the execution count (requests processed)on Umbraco Id would drop to 0 and a couple of minutes later would recover, and start processing requests as normal. The Umbraco Cloud status page was updated to notify our customers that there was an issue regarding authentication.
Though we have a lot of telemetry on the Cloud services, we were unable to identify the cause of the issue and we contacted our partners at Microsoft to help mitigate the problem. We Investigated the following issues with support professionals at Microsoft to help identify the problems
While the investigation was ongoing we replaced the infrastructure on which Umbraco Id was running and started to see immediate positive ingestion of traffic. Together with support professionals at Microsoft, we decided to lower the severity of the support case to reflect that there were no longer any business-critical components affected.
Late on Thursday, July 7, the issue reappeared and this time we responded immediately by reaching out to Microsoft. The intermittent nature of the issue still made it hard to predict the exact cause but we started by investigating further:
Our underlying component that handles all communication between UmbracoId and Azure AD is an Azure Function. We see an average of 200 to 300 requests per minute and up to 2000 requests per minute at peak times. When the resource detects that functions are executing slowly, it will auto-heal and recycle the application which is what we see in the figure above. What we saw was that corresponded well with the timing of the incidents.
So why did the response time slowly deteriorate?
From here we did a memory allocation analysis and discovered that the memory cleanup (garbage collection) was running quite often and causing slow response times. That in turn triggered the auto-heal functionality and recycled the underlying App Service Plan. Additionally, we found that the default was set to 32-bit processing which would explain why the application was struggling. 32-bit processes can only access a limited amount of memory and provided that there was a lot of traffic, the garbage collector simply could not keep up with the memory restrictions.
Actions based on the root cause analysis
First and foremost we identified that the default processing architecture on Azure Functions was a limiting factor and have upped this to 64-Bit. Enabling this immediately allows for more efficient memory management, additionally, we have updated this across our services to ensure we do not encounter the same challenges.
To avoid similar issues in the future we are implementing policies for Azure Functions ensuring 64-Bit processing is the default going forward for any new infrastructure added to the platform.
To ensure that we always have resources available, we have increased the replication and fallback resources as a contingency should our primary compute unit fail.
If you have any questions related to the above please feel free to contact your partner manager, reach out through our support channels or the Umbraco Cloud issue tracker on GitHub: https://github.com/umbraco/Umbraco.Cloud.Issues/issues