On Sep 14, 2022, between 03:36 PM and 04:26 PM UTC, Atlassian customers using the Opsgenie product received delayed notifications for up to 50 minutes. The event was triggered by a code change that upgrades a common framework. The changes included in this framework update impacted customers in the both US and EU regions. The incident was detected by the on-call developer and mitigated by reverting the latest changes, which put Opsgenie systems into a known good state. The total time to resolution was around 50 minutes.
The overall impact was between Sep 14, 2022, 03:36 PM UTC, and Sep 14, 2022, 04:26 PM UTC on Opsgenie products. The incident service disruption was limited to US and EU region customers who did not receive their notifications immediately, but instead experienced notification delays of up to 50 minutes. In total, ~132K notifications in the US region and ~23.6K notifications in the EU region were sent with delays. Only less than %0.6 of the active customers were affected.
The issue was caused by an Atlassian-initiated change to upgrade a common framework. While the majority of the intended changes had been tested successfully, there were some accompanying changes with the framework upgrade that caused the notification service to stop processing new notification requests. Instead, these notifications remained in the queues until the deployment was reverted, resulting in notification delays for customers of up to 50 minutes.
We know that outages impact your productivity.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support