Flexera One - ITAM - NA - Batch Processing has been Delayed
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Flexera One - ITAM - NA - Batch Processing was Delayed

Timeframe: March 10th, 2:00 AM to March 10th, 9:00 AM CST

Incident Summary:

On March 10th at 2:00 AM CST, the batch scheduler system in the NA region encountered an outage which caused a backlog in the environment and led to the failure of batch job processing. As a consequence, some customers may have experienced outdated inventory data.

Technical staff carried out an investigation to identify the root cause of the issue, but they were unable to find any specific reason in the logs. Meanwhile, during the investigation, the system tried to auto-scale, but the newly added instance failed its health check due to a timeout issue.

To address this, our staff disabled the health checker and extended the timeout grace period. This allowed a healthy instance to be deployed, and batch job processing returned to normal. Following the completion of additional health checks and thorough monitoring, our staff concluded that the incident had been resolved.

Technical staff carried out an investigation to identify the root cause of the issue, but they were unable to find any specific reason in the logs. Meanwhile, during the investigation, the system tried to auto-scale, but the newly added instance failed its health check due to a timeout issue.

To address this, our staff disabled the health checker and extended the timeout grace period. This allowed a healthy instance to be deployed, and batch job processing returned to normal. Following the completion of additional health checks and thorough monitoring, our staff concluded that the incident had been resolved.

Root Cause:

The staff could not identify a specific cause in the log files, but they believe the problem to be a seldom-occurring system issue. Additional precautions have been implemented to prevent such incidents in the future.

Corrective Actions:

  1. To resolve the issue, our staff disabled the health checker and extended the timeout grace period.
  2. This allowed a healthy instance to be deployed, and batch job processing returned to normal.
  3. Additional precautions have been implemented to prevent such incidents in the future.
  4. We have been diligently monitoring the environment since the implementation of the changes, and no reoccurrence of the issue has been detected. The environment continues to maintain its stability.
  5. Staff will work on integrating improved logging to ensure more detailed records are captured if the issue repeats itself.
Posted Mar 30, 2023 - 10:02 PDT

Resolved
This incident has been resolved.
Posted Mar 10, 2023 - 12:15 PST
Monitoring
Our technical staff have identified and resolved the issue, and after conducting health checks, we can confirm that all services are now stable. To ensure uninterrupted service, our technical team will continue to closely monitor the environment over the next few hours to ensure that all services are functioning smoothly without any further issues.
Posted Mar 10, 2023 - 09:30 PST
Investigating
Incident Description: We are currently experiencing issues with the batch scheduler system, which may affect customers using ITAM in North America. As a result, batch jobs are not being processed, causing a backlog in the environment. This could lead to outdated inventory data for affected customers.

Priority: P2

Restoration Activity: Technical teams have been engaged and are investigating.
Posted Mar 10, 2023 - 08:19 PST
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Batch Processing System).