IT Visibility - NA - Data Processing is currently delayed
Incident Report for Flexera System Status Dashboard
Postmortem

Description: IT Visibility - NA - Data Processing was Delayed

Timeframe: May 26th, 1:51 AM to June 9th, 5:56 AM PDT

Incident Summary:

There have been multiple occurrences recently with the IT Visibility Dashboard not refreshing periodically impacting customers' ability to retrieve the most recent data. On Thursday, May 26th, at 1:51 AM PDT, technical staff identified a reoccurrence of the IT Visibility data currency issue. This specific incident impacted the US customers, and data may have been up to 20 hours behind.

To alleviate the issue, technical staff deployed multiple optimizations in production. On May 27th, at 2:49 AM PDT, after successful testing in the lower environment, database services were re-allocated, and additional instances were added to enable faster data retrieval.

On May 31st, after additional monitoring over the weekend, technical staff observed that backlog processing was still behind. As a short-term solution, staff disabled the non-critical workload from the database and engaged SMEs from the other areas to assist with the long-term solution.

Meanwhile, technical staff continued to test multiple optimizations in the lower environments. After internal discussions and further investigation, technical staff manually removed the obsolete data to allocate extra space occupied in the database.

After overnight monitoring, on June 1st, technical staff observed a significant improvement in the data processing speed, following which technical staff automated the process of clean-up of redundant data in the database. In addition, technical staff also deployed additional instances to the remaining services in the database.

Over the next few days, staff continued to analyze, isolate, and eliminate any contributing factors causing the performance degradation and observed significant improvement in the data processing speed.

On June 9th, at 5:56 AM PDT, following additional health checks and monitoring, this incident was declared resolved.

Root Cause:

The flow of data was impaired due to insufficient allocation of memory resources in the environment.

Contributing Cause:

There was a large amount of obsolete data in the database, which required manual intervention to run a cleanup and allocate extra space for incoming requests

Corrective Action:

• Database services were re-allocated, and additional instances were added to enable faster data retrieval
• Redundant data was removed from the database and additional optimizations were deployed to enhance data processing
• As a long-term fix, technical staff will continue to work on a roadmap for Q3 to transition to a more viable solution

Posted Sep 27, 2022 - 13:26 PDT

Resolved
This incident has been resolved and has transitioned to a problem management investigation to research and implement a long-term fix to avoid future recurrences. We have been making a progress and the processing speed has increased significantly since the beginning of this incident. Our technical teams will continue to work on a more viable solution, and meanwhile, we will continue our efforts to make additional optimizations in the environment.
Posted Jun 07, 2022 - 08:09 PDT
Update
We have deployed additional optimizations to production. The Dashboard updates are currently in progress, and we are continuing to monitor the environment.
Posted Jun 06, 2022 - 10:46 PDT
Update
The investigation is ongoing. We are currently testing a new set of optimizations to be deployed in production later today.
Posted Jun 06, 2022 - 05:32 PDT
Update
We are still observing a slow response in the application. Our staff deployed a new set of optimizations in production, but that did not yield the desired results. Discussions are ongoing, and we continue to mitigate problems while working on a long-term solution.
Posted Jun 02, 2022 - 10:44 PDT
Update
We have deployed additional optimizations to production, following which new queries have been running normally, and no new anomalies have been observed so far. The investigation is still ongoing. We are continuing to isolate problems and work on a long-term solution.
Posted Jun 01, 2022 - 11:31 PDT
Update
Performance has significantly improved following some of the remediation actions. Technical teams are continuing to make optimizations in the environment and investigate a long-term fix.
Posted Jun 01, 2022 - 08:00 PDT
Update
During the investigation, technical teams discovered that some of the queries to export data from IT Visibility to ServiceNow are in a hung state and as a result some of the US customers with IT Visibility ServiceNow integration may experience issues while completing their export activities.
Posted Jun 01, 2022 - 07:04 PDT
Update
We have deployed additional changes to provide relief in the environment. Technical teams are continuing to work on a fix and long-term solution for this issue.
Posted May 31, 2022 - 13:44 PDT
Update
The investigations are still ongoing. Technical teams are working on isolating any performance issues in the environment and making enhancements.
Posted May 31, 2022 - 06:56 PDT
Update
Technical teams continue to eliminate any contributing factors to mitigate the problem and simultaneously investigate a long-term solution.
Posted May 30, 2022 - 10:58 PDT
Update
Performance has improved since the changes were deployed on Friday; however, we are still observing delays in processing. Technical teams have identified some of the contributing factors and continue to work towards a resolution.
Posted May 30, 2022 - 07:54 PDT
Update
The backlog processing has been ongoing and we are continuing to make progress. Technical staff will be monitoring the environment overnight, and the next update will be provided tomorrow morning.
Posted May 29, 2022 - 13:31 PDT
Update
Production deployment is still ongoing, and we are continuing to make progress.
Posted May 27, 2022 - 13:05 PDT
Update
The configuration change in Production is progressing as planned. We have processed about 1/3 of the backlog and validated that IT Visibility Dashboards are being updated.
Posted May 27, 2022 - 08:56 PDT
Update
We have successfully tested the configuration change and are on track to deploy it in production.
Posted May 27, 2022 - 04:51 PDT
Identified
We are continuing to test a fix to re-organize the configuration to enable faster data processing, following which it will be deployed in the production environment.
Posted May 26, 2022 - 19:01 PDT
Update
We are currently testing a fix to re-organize the configuration to enable faster data processing, following which it will be deployed in the production environment.
Posted May 26, 2022 - 07:00 PDT
Investigating
Incident Description:
IT Visibility Data Processing in North America is currently delayed by up to 20 hours, resulting in some data being out of date.

As a result of this incident, some of the US customers with IT Visibility ServiceNow integration may also experience issues while completing their exports.

Priority: 2

Restoration activity:
Technical teams have been engaged and are currently investigating.
Posted May 26, 2022 - 02:05 PDT
This incident affected: Flexera One - IT Visibility - North America (IT Visibility US).