IT Visibility - NA - Data Processing has been Delayed
Incident Report for Flexera System Status Dashboard
Postmortem

Description: IT Visibility - NA - Data Processing was Delayed

Timeframe: January 22nd, 6:23 PM to January 23rd, 4:40 AM PST

Incident Summary

On Sunday, January 22nd, at 6:23 PM PST, we experienced a service interruption with the IT Visibility data streaming service which may have caused a processing backlog in the environment. As a result, some customers may have experienced delays accessing the most recent inventory data from IT Visibility.

During the routine health checks at 1:30 AM PST, technical staff found that one of the pods for the data streaming service was in an unhealthy state, with it failing and restarting repeatedly. After further investigation, staff identified some unknown data sent by one of the orgs to be the cause behind resource contention issues within the pod. This further caused a downstream impact on the data processing for the other orgs utilizing the same pod.

At 2:40 AM PST, technical staff rebalanced the incoming traffic between other pods in the cluster to reduce the load on the impacted pod. In addition, at 4:39 AM PST, technical staff temporarily moved the unknown data in question to an alternate pod, following which the data processing resumed as usual.

At 4:40 AM PST, the backlog processing was completed. After additional monitoring, this incident was declared resolved.

Root Cause

As per the preliminary investigation, technical staff found some unknown data sent by one of the orgs to be the cause behind resource contention issues within the pod. This further caused a downstream impact on the data processing for the other orgs utilizing the same pod.

Corrective Action

  1. To reduce the load on the impacted node, we have rebalanced the incoming traffic between other pods in the cluster
  2. In addition, we have temporarily moved the data in question to an alternate pod to stop it from streaming and crashing the pod
  3. Technical staff have been tasked to investigate the root cause behind the instability caused by the data streaming from the specific org
Posted Jan 27, 2023 - 11:05 PST

Resolved
This incident has been resolved.
Posted Jan 23, 2023 - 07:08 PST
Investigating
Incident Description: Customers may experience delays before inventory data is visible due to a processing backlog. As of 2:00 AM PST, data may have been up to 7 hours out of date. No impact on EU customers.

Priority: 2

Restoration activity:

Technical teams have been engaged and are investigating.
Posted Jan 23, 2023 - 01:58 PST
This incident affected: Flexera One - IT Visibility - North America (IT Visibility US).