IT Visibility - NA - Data Processing has been Delayed
Incident Report for Flexera System Status Dashboard
Postmortem

Description: IT Visibility - NA - Data Processing was Delayed

Timeframe: September 2nd, 4:58 PM to September 5th, 4:05 PM PDT

Incident Summary:

On Monday, September 5th, at 7:11 AM PDT, technical staff identified an issue with the consumer database cluster experiencing high latency while processing data from the data streaming service. As a result, customers may have experienced delays retrieving inventory data due to a processing backlog. Data may have been up to 3 days out of date. This issue impacted the US customers only.

Technical staff found that the database cluster was experiencing connectivity issues and the authentication service inside the database was in an unhealthy state due to memory issues. Further investigation revealed that there was a large volume of data to be processed in the environment requiring manual intervention from the technical staff to resolve any temporary memory issues.

After further analysis, at 9:04 AM PDT, technical staff increased and doubled the database storage IOPS (input/output operations per second) to enable faster data processing. Technical staff also reconfigured the resource distribution in the environment to provide further relief.

Health checks and monitoring showed that the data latency reduced significantly and was within the normal threshold range. At 9:30 AM PDT, the impacted service returned to its stable state, following which the connectivity was restored. Technical staff continued to monitor the environment for the backlog to clear.

At 4:05 PM PDT, the backlog processing was completed. After additional monitoring, technical staff declared the incident to be resolved.

Root Cause:

Analysis showed that a large volume of data was received in the environment which caused temporary memory issues. Manual intervention was required from the technical staff to fine-tune the resources and allocate more memory to process the incoming requests

Corrective Action:

  1. IOPS (input/output operations per second) were temporarily increased to enable the faster data processing
  2. Technical staff also reconfigured the resource distribution in the environment to provide further relief
  3. Alerting system has been updated to directly page a dedicated team to ensure faster response from the technical staff going forward
Posted Sep 26, 2022 - 07:48 PDT

Resolved
We are processing data in real time now. This incident has been resolved.
Posted Sep 05, 2022 - 18:25 PDT
Monitoring
Technical teams have deployed optimizations in the environment to enable faster data processing. The impacted service has also returned to its stable state. We are monitoring the environment for the backlog to clear.
Posted Sep 05, 2022 - 10:14 PDT
Identified
Technical teams have identified that one of the services was experiencing performance degradation due to a memory issue. We are currently working on making optimizations in the environment to bring the service back to its healthy state.
Posted Sep 05, 2022 - 09:11 PDT
Investigating
Incident Description:
Due to a processing backlog, customers may experience delays before inventory data is visible. Data may be up to 3 days out of date. No impact on EU customers.

Priority: 2

Restoration activity:
Technical teams have been engaged and are currently investigating.
Posted Sep 05, 2022 - 08:54 PDT
This incident affected: Flexera One - IT Visibility - North America (IT Visibility US).