Cluster A - Hosted Email Delays (receiving)
Incident Report for OpenSRS
Postmortem

Incident Date: June 3, 2021
Incident Number: PR-2050

On June 3, 2021, at 7:00 PM ET. Tucows’ hosted email platform experienced service interruption impacting inbound emails for a subset of users in both Prod A.

The service interruption was caused due to high load resulting in packet loss on the network device. 

On June 5, 2021, at 3:33 PM ET, The engineering team recovered the impacted services by migrating the systems and adding additional resources.

Tucows is to rebuild the systems and improve redundancy to the network device.

Tucows is to enhance the monitoring for better visibility. 

Thank you,

Tucows Engineering Team

Posted Jun 10, 2021 - 13:05 UTC

Resolved
Thank you for your patience. As our engineering team continues to defer a bit of mail, the vast majority of the mail is being delivered quickly. This issue has been resolved as we are no longer seeing issues with new inbound emails.

Incident Start Time: 06-03-2021 23:00:00 UTC
Incident End Time:06-05-2021 07:33:00 UTC
Total Duration: 1 day 8hrs 33mins
Posted Jun 05, 2021 - 09:29 UTC
Monitoring
Our Infrastructure team has fixed the issue, and we are no longer seeing the packet loss. Our engineering team is seeing issues with the rspam which can delay the inbound emails and working on fixing it. Email queues are dropping and we are monitoring the issue.
We will provide further updates as they become available. We again, thank you for your patience.
Posted Jun 05, 2021 - 04:11 UTC
Identified
Our Infrastructure team has identified the issue and are working to resolve the problem as quickly as possible. We appreciate your patience and will continue to update with the progress.
Posted Jun 05, 2021 - 02:29 UTC
Update
We continue working to resolve the problems in our Cluster A hosted email service and have engaged our network operations team to assist with our hosted email operations investigation.

We'll continue to provide updates as they come.

11:31 p.m. UTC
Posted Jun 04, 2021 - 23:31 UTC
Update
We continue working to resolve the problems in our Cluster A hosted email service and have engaged our network operations team to assist with our hosted email operations investigation.

We'll continue to provide updates as they come.
Posted Jun 04, 2021 - 21:12 UTC
Investigating
Our hosted email engineers are still working to resolve the issues causing delays. We appreciate your continue patience at this time, we will provide an update once new information is presented.
Posted Jun 04, 2021 - 20:37 UTC
Identified
Our engineering team has identified the issues causing delays and are taking measures to resolve the said delays and deferrals. Users may still experience some delay or deferrals.
Posted Jun 04, 2021 - 19:13 UTC
Update
Our engineers are investing reports of delays and deferrals for our cluster A hosted email service. We will provide updates every 60 minutes or if new information is provided sooner.

We appreciate your patience while we are looking into this issue.
Posted Jun 04, 2021 - 17:43 UTC
Investigating
We are experiencing email delays. We will provide an update shortly with additional information.
Posted Jun 04, 2021 - 17:34 UTC
This incident affected: Hosted Email (Cluster A).