Outbound messages delayed
Incident Report for Mailprotector
Postmortem

Most everyone expects email to work. We understand delays and disruptions can be frustrating, and we apologize for any inconvenience caused by yesterday's outbound mail flow problem.

The update that caused some services to not restart on the outbound gateway servers is believed to be the root cause. The team has repaired the servers.

Despite restarting services and performing typical remediation steps, the processing time grew to just below 30 seconds per message. The delay created a chain reaction that led to a significant backlog of messages in the queues.

  • The backlog also caused the gateway to stop accepting new connections once the system’s resources were at capacity.

    • Servers trying to relay through Mailprotector’s smarthost were holding onto messages for up to several hours.
    • Until the email servers could hand messages off to the Mailprotector smarthost relay, the email servers continued to retry the connections.
    • Multi-function devices and scanners do not have intelligent email processing and experienced errors when the relay would not accept the devices' connection requests.
  • The team took steps to process the backlog as quickly as possible and identified a way to reduce message processing time to the typical 1 second or less.

    • As the backlog drained from the queues, the smarthost relay began accepting new connections from email servers.
    • The backlog of messages on email servers caused temporary delays as new surges of messages came through.
    • At approximately 10:30 PM ET, the outbound mail flow returned to normal operation.

Additional processes are being implemented to mitigate possible future delays at the outbound gateway.

Mailprotector aims to provide near 100% uptime because we know how important email is. As with any technology, especially an actively developed solution, unforeseen problems can occur. Our team learns from every incident and tries to eliminate the possible recurrence of events in the future.

Thank you for your continued partnership with Mailprotector.

Posted Mar 24, 2021 - 11:02 EDT

Resolved
The outbound queues have been emptied. Performance has been restored to normal.

There may be additional, short delays for a couple of hours as servers connect to Mailprotector's smarthost relay to empty their outbound queues. Those surges of mail could cause brief delays.

The issue is considered resolved.
Posted Mar 23, 2021 - 22:03 EDT
Monitoring
A fix has been implemented that has significantly improved the backlog processing. The queues are beginning to empty quickly. Some delay is still expected as the outbound gateway systems normalize. The team is monitoring the process.
Posted Mar 23, 2021 - 21:31 EDT
Update
We are continuing to work on a fix for this issue.
Posted Mar 23, 2021 - 19:51 EDT
Update
The outbound queue backlog remains heavy. Users reporting delays of several hours are due to the outgoing emails still waiting to be accepted by Mailprotector's smarthost relay. Those messages are being retried from Microsoft 365, Google Workspace, and other email servers using the Mailprotector relay. The behavior will continue until the queue backlog is cleared and all connection requests are accepted again.

The team is continuing to work on the issue.
Posted Mar 23, 2021 - 19:33 EDT
Update
The team is continuing efforts to clear the backlog of outgoing messages from the queue.

Clients using Exchange Online (Microsoft 365) or Google Workspace may consider temporarily disabling the outbound connector or outbound mail route. This should allow the queued emails on those platforms to be sent directly from their systems without passing through Mailprotector's smarthost relay. This suggestion assumes the retry of those platforms' queues will use the updated routing information.
Posted Mar 23, 2021 - 18:24 EDT
Identified
Delays in the SMTP queue were peaking at 30 seconds, causing the queue to stack up and degrade performance. The team has managed to reduce the SMTP queue closer to the typical 1 second processing time. The delivery queue still has a significant backlog, and continued delays are to be expected.

The team is working on the issue of returning outbound email delivery to normal processing speeds.
Posted Mar 23, 2021 - 16:12 EDT
Investigating
We are receiving reports, and have confirmed, outbound email delays. The team is investigating the issue.

Outbound email is delivering with an approximate delay of about 10 minutes. Attempts to relay through Mailprotector's smarthost may fail due to services appearing unavailable. This affects SMTP Authenticated relay from devices and applications.
Posted Mar 23, 2021 - 15:09 EDT
This incident affected: CloudFilter and SafeSend Email Security.