Description: Flexera One – Cloud Cost Optimization – NA, EU & APAC – Azure Data Import Failure
Timeframe: May 2nd, 12:00 AM to May 9th, 12:00 PM PDT
Incident Summary
We encountered a service disruption on May 2nd, which impacted the availability of Azure bills and policies. Consequently, customers may have faced difficulties when accessing new Azure Policy imports through the Cloud Cost Optimization Platform, leading to the unavailability of precise billing data for our customers.
Upon conducting a thorough examination, we reached out to our service provider and discovered that they had implemented changes to their API configuration on May 2nd. These alterations directly impacted the connectivity of Azure policies, resulting in the emergence of subsequent issues. The issue remained undetected initially because of insufficient monitoring measures in place. However, on May 8th, we began receiving multiple reports from the impacted customers, which alerted us to the situation.
To address the issue and minimize its impact on our external customers, we implemented a workaround on May 8th at 9 PM PDT. This involved refining our data retrieval process by adopting a segmented approach, making separate API calls in smaller segments instead of retrieving all the data at once. This adjustment significantly enhanced the efficiency of data handling and processing, leading to improved overall performance and reduced strain on the system.
After implementing the workaround, data processing operations resumed their normal functioning. However, due to the substantial volume of data, the system experienced a backlog, resulting in a delay. It took several hours to fully process and catch up on the data, with the successful backlog completion occurring on May 9th at 12:00 PM PDT.
On May 14th, our service provider successfully deployed a hotfix to mitigate the issues stemming from the API configuration. This intervention enabled the retrieval of a larger segment of data from the API without encountering any further complications.
Root Cause
Upon investigation, it was determined that the root cause of the service disruption was attributed to changes made by our service provider to their API configuration on May 2nd. These changes had unintended consequences, impacting the functionality and proper integration of Azure bills and policies.
Remediation Steps
Future Preventative Measures