Flexera One - NA - Customers may be unable to access modules within Flexera One
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Flexera One – NA – ITAM and other Modules Inaccessible

Timeframe: September 2nd, 2:18 PM to September 2nd, 4:00 PM PDT

Incident Summary:

On September 2nd, at 2:18 PM PDT, during the scheduled maintenance activities to replace one of the old servers used for service discovery, technical staff observed several 503 errors in the internal logs. The health checks also indicated that multiple modules within Flexera One may have experienced a service disruption.

As a result, some customers may have observed an error while accessing ITAM UI views within Flexera One. During the incident, technical staff also received multiple alerts for Cloud Cost Optimization and Automation services. Staff was able to recreate the issue using the demo account and internal test account for ITAM, however, there were no reports from the customers during the outage.

At 2:22 PM PDT, technical staff attempted to unseal the server vault to gain access into the vault and run the stopped operations but observed several errors indicating the whole server cluster was in an unhealthy state.

After further investigation, technical staff found that one of the steps was missed during the server replacement, resulting in issues with the server cluster electing a new leader, bringing the whole cluster down. At 3:33 PM PDT, technical staff attempted to redeploy and restart the impacted server but encountered errors for some dependent services. At 3:48 PM PDT, the dependent services were redeployed as well.

The internal load balancer logs indicated successful connections, and 500 errors were no longer observed. After further health checks and monitoring, at 4:00 PM PDT, the incident was declared resolved.

Root Cause:

Technical staff found that one of the steps was missed during the server replacement, resulting in issues with the server cluster electing a new leader, bringing the whole cluster down.

Corrective Actions:

• Technical staff initiated the deployment of impacted services again by following the correct steps and procedure
• Updated and fixed runbooks to simplify and correct the server replacement and recovery instructions

Posted Oct 14, 2022 - 09:15 PDT

Resolved
This incident has been resolved.
Posted Sep 02, 2022 - 16:00 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 02, 2022 - 15:50 PDT
Investigating
Incident Description:
Customers may be unable to access modules within Flexera One and receive an authentication error message.

Priority: 1

Restoration activity:
Technical teams have been engaged and are currently investigating
Posted Sep 02, 2022 - 15:18 PDT
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Login Page).