Increased error rate in Customer Portal
Incident Report for Yext
Postmortem

Summary

Starting at 2:20 PM ET on Sept 29, 2020 our monitoring indicated increased rates of errors in the Customer Portal.  By 3:00 PM ET much of the issue had been mitigated, and by 4:15 PM ET the issue was totally resolved.

Root Cause

The system that provides credentials and other secrets to the application servers experienced errors after a routine operation that led to some application servers being unable to function.  There was no security implication of this incident, but it did cause an availability problem.  Upon fixing the issue in that system we were able to restore service.

Remediation

We are:

  1. Improving our monitoring and alerting in this system to be able to more quickly identify similar types of issues in the future.
  2. Adjusting our runbooks for the operation in question to reduce the likelihood of future issues.
  3. Making the system more resilient to the type of error that occurred after the operation.
Posted Oct 05, 2020 - 10:45 EDT

Resolved
This incident has been resolved.
Posted Sep 29, 2020 - 18:00 EDT
Monitoring
All services have been restored and are monitoring.
Posted Sep 29, 2020 - 16:40 EDT
Identified
We have identified the root of the issue and are working on remediation. We anticipate full restoration of services shortly.
Posted Sep 29, 2020 - 15:56 EDT
Investigating
We are seeing an increase in errors in the Customer Portal. We actively working on remediation.
Posted Sep 29, 2020 - 14:48 EDT
This incident affected: Customer Portal Login.