Description: Flexera One - ITAM - NA - Batch Processing was Delayed
Timeframe: March 10th, 2:00 AM to March 10th, 9:00 AM CST
Incident Summary:
On March 10th at 2:00 AM CST, the batch scheduler system in the NA region encountered an outage which caused a backlog in the environment and led to the failure of batch job processing. As a consequence, some customers may have experienced outdated inventory data.
Technical staff carried out an investigation to identify the root cause of the issue, but they were unable to find any specific reason in the logs. Meanwhile, during the investigation, the system tried to auto-scale, but the newly added instance failed its health check due to a timeout issue.
To address this, our staff disabled the health checker and extended the timeout grace period. This allowed a healthy instance to be deployed, and batch job processing returned to normal. Following the completion of additional health checks and thorough monitoring, our staff concluded that the incident had been resolved.
Technical staff carried out an investigation to identify the root cause of the issue, but they were unable to find any specific reason in the logs. Meanwhile, during the investigation, the system tried to auto-scale, but the newly added instance failed its health check due to a timeout issue.
To address this, our staff disabled the health checker and extended the timeout grace period. This allowed a healthy instance to be deployed, and batch job processing returned to normal. Following the completion of additional health checks and thorough monitoring, our staff concluded that the incident had been resolved.
Root Cause:
The staff could not identify a specific cause in the log files, but they believe the problem to be a seldom-occurring system issue. Additional precautions have been implemented to prevent such incidents in the future.
Corrective Actions: