Service Degradation in Live API
Incident Report for Yext
Postmortem

Summary

On 9/24, beginning at 9:10 a.m. ET, Yext engineers began tracking elevating latencies and error rates in Live API. These issues resulted in slower response times and some errors in Answers and Pages. The root cause was identified by 3:04 p.m. ET, and latencies were restored to normal by 5:40 p.m. ET.

The following day, on 9/25, beginning at 12:10 p.m. ET, Yext engineers observed similar symptoms and immediately applied mitigatory measures. Service was restored by 12:26 p.m. ET.

Root Cause

Performance improvement changes meant to improve the service had introduced a bug which increased latencies and error rates when combined with internal operations.

Remediation

We have reverted the change, and will be conducting more testing in the future to ensure that internal operations do not interfere with such changes.

Posted Oct 05, 2020 - 10:39 EDT

Resolved
This incident has been resolved.
Posted Sep 25, 2020 - 00:45 EDT
Update
We have implemented mitigations and error rates have returned to normal. We are continuing to mitigate query slowness.
Posted Sep 24, 2020 - 20:37 EDT
Identified
We have identified the root cause and are working on remediation.
Posted Sep 24, 2020 - 15:03 EDT
Investigating
We are experiencing a delay in updates and query times to Live API, which impacts Answers Serving and Pages. We will update as soon as we have more information.
Posted Sep 24, 2020 - 14:29 EDT
This incident affected: Content (Content API), Search (Search Serving), and Customer Portal Login.