Delay in rules execution impacting sending/receiving messages
Incident Report for Gorgias
Postmortem

Last week, on Aug 12 we had an incident from 7:20 AM to 2:30 PM PST (when it was fully resolved) that resulted in delayed processing of Rules in the helpdesk in the us-east1 cluster. The underlying reason for this delay in processing is due to an edge case performance issue that we had in our API. It was since fixed and deployed to production.

Since all of the new incoming/outgoing messages need to pass through our Rule system the effect was that we had big delays in receiving and sending ticket messages - particularly noticeable for chat and facebook messenger tickets where a fast response time is critical.

What are our future mitigation plans?

  • We’re improving our monitoring system to identify faster this type of issues in the future.
  • We’ll be adding more performance tests for our rules system. This should also result in faster delivery time in general for all customers.

In conclusion: We understand that not being able to send and receive emails/chats for an extended period of time in Gorgias makes our platform redundant and we’re taking all the necessary steps to improve its reliability.

For the customers that have been affected we’re also willing to provide subscription credits. Please reach out to our support here: support@gorgias.com

Please accept my sincerest apology for this incident - we’ll be working hard to reduce the number of incidents in the future.

Alex - CTO and cofounder of Gorgias.

Posted Aug 17, 2020 - 11:18 PDT

Resolved
This incident has been resolved.
Posted Aug 12, 2020 - 13:41 PDT
Update
We processed all incoming and outgoing messages we had in our backlog. New incoming and outgoing messages are now processed in real-time. No message has been lost during this incident. We are continuing to monitor any further issues.
Posted Aug 12, 2020 - 13:33 PDT
Monitoring
We have identified the cause of the problem and already implemented a fix. We are now processing all incoming and outgoing messages as fast as we can. No message has been lost during this incident. The latency should be back to normal in about 2 hours.
Posted Aug 12, 2020 - 12:16 PDT
Update
We are continuing to investigate this issue.
Posted Aug 12, 2020 - 09:43 PDT
Update
We are continuing to investigate this issue.
Posted Aug 12, 2020 - 09:18 PDT
Investigating
We are currently investigating this issue.
Posted Aug 12, 2020 - 08:02 PDT
This incident affected: Helpdesk Integrations (Email, Mailgun SMTP, Mailgun inbound email, Mailgun outbound email, Mailgun API, Gmail, Outlook, Live Chat, Smooch Core API, Facebook Posts & Comments, Instagram comments, Shopify integration, Shopify API & Mobile, Aircall Public API, Aircall Aircall Apps, Stripe API, Facebook Messenger).