Budget Pacing Controller Downtime
Incident Report for Xandr
Postmortem

Incident Summary

From approximately 08:00 to 01:48 UTC on Thursday, January 28, 2021 a connection loss in NYM2 datacenter caused network saturation, resulting in data delays impacting the real-time data the budget streaming service relies on.

Scope of Impact

As a result of this incident some customers may have experienced overspend/underspend during the incident window.

Timeline (UTC)

2021-01-28 08:00: Incident Started: Automated alert system notified engineers of hard reboot to the servers
2021-01-28 09:30: Automated alert system notified engineers of data delays to budget streaming service.
2021-01-28 10:37: Fallback budget system enabled by engineers
2021-01-28 10:56: Internal incident created and escalated
2021-01-28 01:45: Fallback budget system disabled by engineers
2021-01-28 01:48: Incident Resolved : Servers back up and running

Cause Analysis

The root cause was due to hard reboot to our servers in NYM2.

Resolution Steps

Our engineers mitigated the effects by enabling fallback budget systems and adding additional data processing hardware.

Next Steps

  • Enable more precise automated alerts to detect budget fallback.
  • Assess impact of network traffic levels on data pipeline capacity.
Posted Feb 18, 2021 - 14:24 UTC

Resolved
This incident has been resolved.
Posted Jan 29, 2021 - 02:05 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jan 29, 2021 - 00:53 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Bidding
  • Impact(s):
    • Some objects may spend under budgets
    • Some objects may spend over budgets
  • Severity: Major Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Jan 28, 2021 - 23:12 UTC