Redtail CRM Latency / Slow Response Times
Incident Report for Redtail Technology
Postmortem

Summary

On Wednesday, January 27th Redtail experienced an issue that impacted users accessibility to most of our products. One of our central database servers experienced issues which were exacerbated by user login attempts across multiple time zones at the same time once the database issue was resolved thus resulting in a longer recovery time.

What Happened?

On the evening of January 26th, our infrastructure team migrated some data tables from a legacy database server to it's AWS replacement. During the hours after the migration, the RDS instance began to experience performance issues. A secondary issue not related to the data migration and occurred when the user table of the legacy database server became inaccessible. This resulted in connections not being able to authenticate. Due to the inability to authenticate, connections were building up causing a flood of connections to hit our services at the same time once access to the table was restored. After the data table access was restored, our engineers worked to add some additional infrastructure to mitigate the impact of the connection load flooding our servers.

Another complicating issue was the weather in the Sacramento area. The inclement weather resulted in multiple power outages through the night and lead to several technicians being without power. This directly impacted response times and monitoring solutions thus delaying the overall initial response during triage of the incident.

Root Cause(s):

The root cause for the incident was related to a data table becoming inaccessible within a central database.

We at Redtail would like to extend to you our humble and sincere apology for any negative impact the outline issue(s) above had on you and your business. We understand how critical it is that we deliver maximum uptime to support your daily operations, and will increase our efforts to meet and exceed your availability expectations. Please rest assured that we will do everything we can to learn from this event and use it to strive for improvement across all of our services.

Posted Jan 28, 2021 - 13:32 PST

Resolved
This incident has been resolved.
Posted Jan 27, 2021 - 10:55 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 27, 2021 - 08:57 PST
Update
Our tenacious issue-sniffing tech hounds have tracked down the root cause and are working quickly to resolve. Some users may continue to see slow response times and errors when to logging in to and when navigating within Redtail CRM. We do sincerely apologize for any inconvenience the issue may cause to your daily operations. Thank you for your understanding and patience while we work to remedy these issues.
Posted Jan 27, 2021 - 07:46 PST
Identified
The issue has been identified and a fix is being implemented.
Posted Jan 27, 2021 - 05:55 PST
Investigating
Some users are reporting slow response times when to logging in to and when navigating within Redtail CRM. We have unleashed the issue-sniffing tech hounds to track down the root cause. We do sincerely apologize for any inconvenience the issue may cause to your daily operations.
Posted Jan 27, 2021 - 05:26 PST
This incident affected: Redtail API (REST), Redtail CRM, Redtail Imaging, and Retriever Cloud.