Line item editing fails intermittently
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 19:00 UTC on Wednesday, Dec 07 to 16:05 UTC on Friday, Dec 09, 2022. The number of requests made to the backend API resulted in timeouts and caused "Failed to fetch LineItemReferencedItems" errors to UI clients, due to the GA release of Budget Rollover (12/7).

Incident Impact

Nature of Impact: Data transport issues impacting API/UI loads
Timeframe:~43.08 Hours. 19:00 UTC on Wednesday, Dec 07 to 16:05 UTC on Friday, Dec 09, 2022
Scope: Global
Magnitude: All customers were impacted

Timeline (UTC)
2022-12-07 19:00: Budget rollover GA enabled
2022-12-09 10:00: Incident Started
2022-12-09 11:20: IM Ticket Created
2022-12-09 13:00: Escalated to Engineer
2022-12-09 16:05: Incident Resolved

Cause Analysis

The number of requests made to the backend API resulted in timeouts and caused "Failed to fetch LineItemReferencedItems" errors to UI clients, due to the GA release of Budget Rollover (12/7). The application responsible for the backend API was not able to keep up with the increase in load since the GA activation, and the service timeouts led to the overall issue while saving the Line Item edit page.

Resolution Steps

  • Removed Budget Rollover from GA release which eased up the number of requests made to the backend API.
  • Increased the resources for the app responsible for the backend API in Data Centres (AMS3 and NYM2).

Follow-Up Items

  • Ensure to have more incremental rollouts with GA releases such as budget rollover.
  • Estimate the resource and load increase from open Beta to GA and ensure the application is able to handle the increase in load, before enabling it.
  • Add capacity alerts on API
Posted Jan 22, 2023 - 18:25 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Dec 09, 2022 - 17:23 UTC
Monitoring

We have patched the issue and are monitoring our systems closely.:

  • Component(s): Buy-side pages
  • Impact(s):
    • Page load failures and errors in user interface
    • Unable to save/edit objects
  • Not Impacted:
    • Ad Serving
    • Bidding
    • API
  • Geolocation(s): Global (Global)

We will provide an update as soon as the issue has been fully resolved.

Posted Dec 09, 2022 - 12:19 UTC