Acquia has detected a temporary service delay in Acquia Personalization and Content Hub services
Incident Report for Acquia, Inc.
Postmortem

Purpose of This Report

This is a summary and analysis of an issue that occurred with the delivery of an Acquia product or service. The purpose of this document is to share details about what happened and why, so there is a common understanding of what is required to prevent a future occurrence if at all possible. Any remaining issues or risks are identified, as are recommended or pending actions.

Executive Summary

Between 1-3 June 2021 Acquia Content Hub and Personalization services experienced a degradation of service which caused syndication requests to queue for extended periods and affected some Drupal application functions attempting to contact the Content Hub service.  This degradation was caused when a large volume of requests exceeded the ability of the API/Database to process them.  Acquia R&D has identified a number of remediations - enhancing service optimization as well as preventing congestion from any particular application to impact performance for other customers - in order to mitigate risk of recurrence.

Event Summary

Between 1 and 3 June 2021 Acquia Content Hub and Personalization experienced a degradation of service for customers in the US East region of service. This degradation caused significant delays in the syndication of content and operations dependent on interactions with the Content Hub service.  During this event content syndication requests queued and all were processed as actions were taken to increase capacity and mitigate load on the service.

Acquia Actions

  • 1 June - Acquia identified a small number of customers generating large volumes of syndication requests and worked to establish separate queues for these customers.
  • 2 June - Acquia continued working with a small number of individual customers to pause some actions in progress to allow other queued items to be processed.  Throughout the day Acquia R&D monitored the service.
  • 3 June - Acquia R&D continued to monitor service health as Content Hub caught up with all queued requests.  As the backlog of queued requests was processed, paused actions were resumed. During this time paused actions were restarted and carefully monitored until 22:30 UTC as service continued to operate as expected.

Identified Root Cause

The root cause of this service degradation was a large influx of entity node revisions (nearing 500,000 per hour).  This exceeded the capacity of the API/database to process incoming requests resulting in requests being queued and taking significant times to process.

Corrective Actions

  1. Acquia R&D has identified areas where redundant data may be removed in order to prevent performance degradation related to this data.
  2. Acquia R&D has identified database optimizations to further mitigate issues of this type.
  3. Acquia R&D will further decouple Content Hub actions from editorial workflows to prevent Content Hub service degradations from affecting Drupal application functions.
  4. Acquia R&D will implement congestion control strategies in order to prevent large numbers of requests from specific applications from affecting regional service.
Posted Jun 10, 2021 - 20:14 UTC

Resolved
The underlying cause of this service interruption has been addressed. All affected Acquia Personalization and Content Hub services have been restored. All services are operational at this time.
Posted Jun 03, 2021 - 22:20 UTC
Monitoring
Acquia Personalization and Content Hub customers in the US East product region should be seeing improvements in processing with queue times returning to normal. Acquia is closely monitoring the situation as we move to resolution.
Posted Jun 03, 2021 - 14:08 UTC
Update
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 03, 2021 - 10:16 UTC
Update
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 03, 2021 - 05:42 UTC
Investigating
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 03, 2021 - 02:03 UTC
Update
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 02, 2021 - 22:00 UTC
Update
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 02, 2021 - 19:05 UTC
Update
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 02, 2021 - 17:03 UTC
Identified
Acquia Personalization and Content Hub customers in the US East product region may still be experiencing slow data processing. Acquia is continuing to investigate options to mitigate and resolve this issue.
Posted Jun 02, 2021 - 13:30 UTC
Update
We have focused the scope of the investigation and are continuing to look into the cause of the service degradation. We will provide additional information as it becomes available.
Posted Jun 02, 2021 - 03:36 UTC
Update
We are still working to resolve the service degradation affecting data processing for Acquia Personalization (formerly known as Acquia Lift) and Content Hub services. We will provide additional information as it becomes available.
Posted Jun 02, 2021 - 00:14 UTC
Update
We are still working to resolve the service degradation affecting data processing for Acquia Personalization (formerly known as Acquia Lift) and Content Hub services. We will provide additional information as it becomes available.
Posted Jun 01, 2021 - 22:17 UTC
Investigating
We are currently investigating a delay in data processing for Acquia Personalization (formerly known as Acquia Lift) and Content Hub services. This is impacting customers in the US-East product region. We will provide additional information as it becomes available.
Posted Jun 01, 2021 - 19:23 UTC
This incident affected: Acquia Content Hub and Acquia Personalization.