Alert! - INC141186 - Forge 2.0 Public Website Latency
Incident Report for Central 1
Postmortem

OpenText’s postmortem below:


Incident Summary:

On 7 March 2022, Central 1’s external websites became inaccessible due to an increased load on the LiveSite application. OpenText support teams restored service approximately 75 minutes after receiving alerts of the incident.

How was the client impacted?

Customer websites were not available to visitors.

 What Services were impacted?

LiveSite

 Why did the incident occur?

Code inefficiencies (recursive logic) produced a large increase in requests, overloading the LiveSite nodes.

 How was the incident resolved?

The support team restarted the system multiple times but services went down immediately after each restart. The team then increased the number of LiveSite nodes from 4 to 8. With this increase in resources, the LiveSite application was restored.

 Preventative Actions

Implement auto-scaling for LiveSite nodes. This will automatically add resources to handle increased loads. Target: April 29, 2022

Eliminate recursive logic in custom code. (Completed, waiting to be deployed to PROD.) Target: April 1, 2022

 Notes There was a similar incident affecting LiveSite availability on Feb. 1, 2022. We modified custom code to limit the depth of recursive rendering requests after that event, but this limit was not adequate for the 4 LiveSite nodes in use at the time of the March 7 incident. We believe the 8 nodes now in place will support any recursive requests as the limit is still in place and we observed at most 6 simultaneous recursive requests—not infinite recursion. This recursion will be eliminated from the code in the next deployment.

 ~OpenText

Posted Mar 24, 2022 - 14:14 PDT

Resolved
Services have remained stable throughout the day.

OpenText is working to complete their root cause analysis and we will update this notice with their postmortem when available.
Posted Mar 07, 2022 - 13:43 PST
Monitoring
The public website latency has improved and the content management system access has been resolved at approximately 10:10 a.m. PT (1:10 p.m. ET). Central 1 is continuing to monitor services and will work with OpenText for root cause analysis.

We will provide our next update by 1:30 p.m. PT (4:30 p.m. ET).
Posted Mar 07, 2022 - 10:35 PST
Investigating
Central 1 is investigating an eStudio | OpenText latency affecting the loading times of public websites and the content management system.

The latency began at 9:05 a.m. PT (12:05 p.m. ET). eStudio has been engaged and is working with Central 1. We will provide an update by 10:30 a.m. PT (1:30 p.m. ET).

Central 1 - DigitalBanking_Support@Central1.com - 1.888.889.7878, option 2
Posted Mar 07, 2022 - 09:44 PST
This incident affected: Incident Alerting.