Delayed Webhook Processing
Incident Report for Xendit
Postmortem

Incident Date: 2023-06-07

Affected Products: API (eWallets, Virtual Accounts, Retail Outlet, Recurring Payments, Payouts, xenDisburse, xenPlatform Accounts, Direct Debit, Invoice, PayLater, QR Codes, Reports, Third-party Integrations, Payments API).

Report URL: https://status.xendit.co/incidents/td1fcnt8n1nf

What happened? 

At 2023-06-07 15:01 GMT+7, we detected an increased number of error messages from our payment services to the webhook system. This triggered an alert to the team and Incident Response was activated.

During the incident, we identified a network connectivity issue that caused some services to fail sending requests to webhook service. Our investigation revealed that the root cause was an internal misconfiguration on network settings, which prevented  services from delivering webhooks to customers. This caused a delay in delivering webhook to your system, despite payments having been processed/completed. Payment processing is not affected and is operating normally.

The fix was applied at 2023-06-07 15:53 GMT+7 and all Xendit webhooks resumed normal operation at 16:01 GMT+7. We immediately focused on reconciliation and all affected webhooks were reconciled by 18:55 GMT+7.

What measures have we taken to prevent this issue in future?

We are taking this issue seriously. Action items we are taking to prevent issues from happening again in the future:

  1. Implementing stricter and better safeguards for infrastructure-related changes.

    1. Continue to automate manual processes that are prone to human errors
    2. Implementing a better and more comprehensive impact analysis for every releases
    3. Implementing auto detection and notification when performing high-risk changes
  2. Extend our monitoring capabilities to identify potential risks earlier.

    1. Improving and implementing more comprehensive network monitoring capabilities 
  3. Improve our reconciliation process.

    1. Implementing a separate queue to pool undelivered webhook messages for automatic retry
    2. Continue to automate the reconciliation process for faster reconciliation

We understand that you are counting on our reliability for the smooth operation of your business. We sincerely regret any inconvenience this may have caused you and your customers. We are committed to do better by applying our learnings from this event to continuously improve our services to serve you better.

If you require any assistance or have further questions, please contact us at help@xendit.co or through live chat at https://www.xendit.co/.

Thank you for your trust in using Xendit to power your business.

Posted Jun 12, 2023 - 09:24 WIB

Resolved
This incident has been resolved.
Posted Jun 07, 2023 - 17:29 WIB
Update
Dear customers,

The following affected webhooks have been reprocessed: Virtual Account, Retail Outlet, eWallets, Direct Debit, Invoices/Checkout, QR, and Paylater
We are still working to reconcile affected webhooks
We will share update when we have one
Posted Jun 07, 2023 - 16:44 WIB
Monitoring
Dear customers,

A fix has been implemented and we are seeing healthy traffic since 4:01pm GMT+7
We are monitoring closely and working on processing past affected webhooks
Posted Jun 07, 2023 - 16:10 WIB
Update
Dear customers,

Core teams are still in all hands to investigate the issue
We apologize for the inconvenience and we will share more update when we have one
Posted Jun 07, 2023 - 15:51 WIB
Investigating
Dear customers,

We are detecting issues to deliver webhook to your URLs
Payments are completed, but webhook delivery to your URLs might be delayed
Our team is in all hands to investigating the issue
Posted Jun 07, 2023 - 15:11 WIB
This incident affected: Callback (eWallets, Virtual Accounts, Retail Outlets, Invoices, Disbursements, xenPlatform, Direct Debit, PayLater, QR Codes).