Mews connection to Siteminder down.
Incident Report for Mews
Postmortem

Problem

On 2021-01-28 at 12:00 UTC several clients reported that not all updates of inventory were synchronized to SiteMinder channel manager. There was discrepancy between Mews and channel manager extranets. Over 500 SiteMinder integrations were impacted and not fully synchronized for 2 days due to the job slowdown.

Action

We immediately started looking for integration(s), which were causing the delay, we identified 50 failing integrations and switched them off. On 2021-01-28 16:00 UTC we've deployed optimization of restriction calculation algorithm and we ran a job for scheduling resynchronization of SiteMinder integrations. The recovery took a few days to completely resynchronize all SiteMinder integrations without exceeding traffic on SiteMinder side.

Additionally, on 2021-01-29 on we've identified 76 integrations which had increased number of price updates, therefore we disabled synchronization of price updates for them. On 2021-01-29 at 16:16 UTC we deployed a solution that allowed to partially process updates and split the load into multiple sequential pushes, so each push would not exceed critical size and we would maintain stable throughput. On 2021-01-29 at 16:43 UTC we resynchronized all 76 hotels to ensure all price updates are in place.

Causes

  1. Mews created unnecessary updates as result of a bug. In some cases other intervals adjacent to updated ones were sent as well.
  2. Channel manager integrations were not isolated from each other. If one integration was failing due to large updates, other integrations would timeout.
  3. Restriction synchronization degraded system performance.
  4. Unnecessary updates were calculated regardless of enabled channel manager operations.
  5. Absence of action log for the channel manager integration slowed down investigation.
  6. Large updates were sent at once, including includes full year updates which were processed as a single update. This degraded performance and caused timeouts.
  7. Duplicate price updates with same value were generated and sent to channel managers.
  8. Users could trigger manual updates unlimited number of times. This duplicated updates that were processed at the same moment. It also affected performance of third party integrations.

Solutions

  1. Mews no longer sends unnecessary updates. It only sends the updated interval which decreases the overall workload and speeds up synchronization.
  2. Integrations run separately and independently from each other. Mews splits calculation of large updates into smaller batches to have guaranteed maximum execution time avoid timeouts.
  3. We’ve Improved performance of restriction calculation algorithm.
  4. Updates are calculated only for the enabled channel manager operations.
  5. Action log will show events when operations on integration were enabled/disabled.
  6. All updates are split into chunks, with maximum size, so a single large update will get broken down into smaller ones, not affecting system performance.
  7. If price values are the same, Mews won’t generate an update.
  8. Multiple manual updates will be merged into single one to avoid duplicate updates processing.
Posted Mar 29, 2021 - 12:15 CEST

Resolved
This incident has been resolved.
Posted Jan 28, 2021 - 21:32 CET
Monitoring
The Mews connection to siteminder has been fully enabled. All new updates will go through.
Updates in the past will be manually resynchronized in next appromaximately 24 hours.
Posted Jan 28, 2021 - 17:23 CET
Identified
We identified overloaded connections due large amount of updates.
Posted Jan 28, 2021 - 16:15 CET
Update
We are recovering connections between Mews and Siteminder.
Posted Jan 28, 2021 - 15:08 CET
Investigating
We are currently investigating this issue.
Posted Jan 28, 2021 - 14:46 CET
This incident affected: Open API.