Service is disrupted in many regions
Incident Report for logz.io
Postmortem

yesterday there was an AWS power outage in region us-east1 which severely affected the operation of our parts of our service. The service has been back to normal operation since 9pm UTC, and all data lag has been consumed. We are working on a detailed RCA, and will post it once we have all the data

Posted Dec 23, 2021 - 17:15 IST

Resolved
All but a small subset of customers are still experiencing delays in processing data. The Logz.io system continues to be fully operational.
Posted Dec 23, 2021 - 05:24 IST
Monitoring
All but a small subset of customers are still experiencing delays in processing data. The Logz.io system continues to be fully operational.
Posted Dec 23, 2021 - 04:37 IST
Update
The Logz.io system continues to be fully operational. Most of our customers in the affected region have recent data. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 23, 2021 - 03:52 IST
Update
The Logz.io system continues to be fully operational. Most of our customers in the affected region have recent data. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 23, 2021 - 02:48 IST
Update
The Logz.io system continues to be fully operational. Most of our customers in the affected region have recent data. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 23, 2021 - 02:10 IST
Update
The Logz.io system continues to be fully operational. Most of our customers in the affected region have recent data. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 23, 2021 - 01:27 IST
Update
The Logz.io system continues to be fully operational. Most of our customers in the affected region have recent data. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. We expect most data to be in less than an hour. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 23, 2021 - 00:31 IST
Update
The Logz.io system continues to be fully operational. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. We expect most data to be in less than an hour. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 23, 2021 - 00:03 IST
Update
The Logz.io system continues to be fully operational. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. We expect most data to be current within the next hour or less. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 22, 2021 - 23:34 IST
Update
The Logz.io system continues to be fully operational. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. We expect most data to be current within the next hour or less. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 22, 2021 - 23:03 IST
Update
The Logz.io system continues to be fully operational. We have multiple teams working on a subset of our US customers who are still experiencing delays in processing data. We expect most data to be current within the next hour or less. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 22, 2021 - 22:29 IST
Update
The Logz.io system continues to be fully operational. However, a subset of our US customers is still experiencing delays in processing data. We expect most data to be current within the next hour or less. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 22, 2021 - 22:02 IST
Update
The Logz.io system continues to be fully operational. However, a subset of our US customers is still experiencing delays in processing data. We expect most data to be current in 60 mins. The delay is caused by reduced capacity due to the AWS outage as well as a steep increase of data shipped by customers as their instances recover from the AWS outage.
Posted Dec 22, 2021 - 21:25 IST
Update
All login to the application is now working. We are working to recover data gaps for a few of our customers in the US.
Posted Dec 22, 2021 - 19:29 IST
Update
We're starting to see the recovery of our authentication. Google login and user/password methods are now working for non-US regions when accessing the regional URL. We expect a full recovery shortly. Most customers should have complete data available, while a few in US-EAST may see limited data. We apologize for the inconvenience and are working to recover from the AWS power outage as soon as possible.
Posted Dec 22, 2021 - 18:45 IST
Update
We are seeing continued recovery on US-EAST-1. Most data is available and current. We expect recovery to continue and accelerate.
We're still experiencing login issues with specific login methods and are all hands on deck to resolve it.
Posted Dec 22, 2021 - 18:09 IST
Identified
Due to the recent AWS outage in US-EAST, we have lost some resources in US-EAST-1, which caused degraded performance in our API and issues with login to the application. We have isolated the root cause of that and are working on resolving it as soon as possible. We will update you shortly with progress and resolution ETA.
Posted Dec 22, 2021 - 17:43 IST
Update
We are continuing to investigate this issue.
Posted Dec 22, 2021 - 14:44 IST
Investigating
We are currently investigating this issue.
Posted Dec 22, 2021 - 14:43 IST
This incident affected: AWS N. Virginia (us-east-1) (Alerts & Security rules, API, SIEM) and AWS Frankfurt (eu-central-1) (Logs Ingestion).