Ad serving capacity reduced in NYM2
Incident Report for Xandr
Postmortem

Incident Summary
From approximately 16:15 to 17:27 UTC on Tuesday, April 27, 2021 Xandr experienced a reduction in ad serving capacity and an increase in bidder timeouts in the NYM2 datacenter.

Scope of Impact
During the incident, ad serving conducted out of the NYM2 datacenter was at reduced capacity and bidder timeouts were between 40-50%, which resulted in reduced ad serving for some clients.

Timeline (UTC)
2021-04-27 16:15: Incident Started: Bidders started to crash in NYM2
2021-04-27 16:35: Ad serving capacity reduced to 17% in NYM2
2021-04-27 16:52: Incident ticket created
2021-04-27 17:27: Incident Resolved: NYM2 was back to 100% capacity

Cause Analysis
The root cause of the incident was due to bad code with one user id. When that user id appeared in a bid request that caused the bidder to crash.

Resolution Steps
The offending user id was banned which caused the bidder instances to stabilize and ad serving capacity increased back to 100%.

Next Step(s)
• Implement more resilience in code to prevent one user id from causing a widespread crash.

Posted Apr 28, 2021 - 16:54 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Apr 27, 2021 - 17:29 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Ad Serving
  • Impact(s):
    • Ad serving capacity severely reduced for NYM2
  • Severity: Major Outage
  • Datacenter(s): NYM2

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Apr 27, 2021 - 17:05 UTC