Description: IT Visibility - NA - Data Processing was Delayed
Timeframe: July 8th, 3:00 AM to July 10th, 5:25 PM PDT
Incident Summary:
On Friday, July 8th, at 3:00 AM PDT, technical staff found that data in IT visibility Dashboards and exports were stale and approximately 5 hours out of date. Due to a processing backlog, the US customers may have experienced delays while accessing inventory data. There was no impact on EMEA customers.
Technical staff found that the consumer storage cluster was unable to retrieve data from the streaming service. SMEs from other areas were engaged to assist with the investigation. Further investigation revealed that one of the database servers went down around 3 AM PDT, which halted the data streaming.
Technical staff rebooted the impacted server at 8:45 AM PDT, following which the downstream services came back online successfully. Technical staff also verified that the streaming service was online again, and the backlog was cleared at 9:20 AM PDT.
At 9:32 AM PDT, it was found that one of the services was experiencing a memory contention issue which caused a downstream impact on the other services in the environment. To remediate the issue, infrastructure services were scaled up and more resources were allocated to enable faster data processing.
On July 10th, at 5:25 PM PDT, the remaining backlog was cleared. Monitoring showed that the data was processed in real-time again and the incident was declared resolved.
Root Cause:
Investigations found that one of the Database servers went down, which halted the data streaming service
Contributing Cause:
One of the services was experiencing a memory contention issue which caused a downstream impact on the other services in the environment.
Corrective Actions:
• Technical staff rebooted the impacted service, following which the data streaming restarted successfully
• Infrastructure services were scaled up to enable faster data processing