Description: IT Visibility - NA - Data Processing was Delayed
Timeframe: September 2nd, 4:58 PM to September 5th, 4:05 PM PDT
Incident Summary:
On Monday, September 5th, at 7:11 AM PDT, technical staff identified an issue with the consumer database cluster experiencing high latency while processing data from the data streaming service. As a result, customers may have experienced delays retrieving inventory data due to a processing backlog. Data may have been up to 3 days out of date. This issue impacted the US customers only.
Technical staff found that the database cluster was experiencing connectivity issues and the authentication service inside the database was in an unhealthy state due to memory issues. Further investigation revealed that there was a large volume of data to be processed in the environment requiring manual intervention from the technical staff to resolve any temporary memory issues.
After further analysis, at 9:04 AM PDT, technical staff increased and doubled the database storage IOPS (input/output operations per second) to enable faster data processing. Technical staff also reconfigured the resource distribution in the environment to provide further relief.
Health checks and monitoring showed that the data latency reduced significantly and was within the normal threshold range. At 9:30 AM PDT, the impacted service returned to its stable state, following which the connectivity was restored. Technical staff continued to monitor the environment for the backlog to clear.
At 4:05 PM PDT, the backlog processing was completed. After additional monitoring, technical staff declared the incident to be resolved.
Root Cause:
Analysis showed that a large volume of data was received in the environment which caused temporary memory issues. Manual intervention was required from the technical staff to fine-tune the resources and allocate more memory to process the incoming requests
Corrective Action: