Problem with Log Collection
Incident Report for F5 Distributed Cloud
Postmortem

At 2020 hrs UTC, F5 Operation team noticed issues with log collection across Secure Mesh and App Stack services. This affected customers ability to display or query new logs that were being collected past this time (2020hrs UTC) across our POPs and/or customer sites.

We noticed that amount of logs were significantly increasing in last 12 hours and this increase resulted in memory crash for the logging process. We have since recovered the logging cluster and none of the customer logs were lost or missed. Pipeline synced all the logs from buffers when we fully recovered at 2100hrs UTL.

We root caused that during the maintenance window, our configuration changes led to same logs being ingested twice and this led to increase in log collection. This configuration change has been rectified and the load on cluster has reduced.

Posted Apr 12, 2022 - 21:16 UTC

Resolved
This incident has been resolved.
Posted Apr 12, 2022 - 20:39 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 12, 2022 - 20:25 UTC
Investigating
Log collection across Secure Mesh and App Stack services are affected since 12:20pm PST and customers will not be able to query new logs on F5XC Console (Portal). Our team is investigating the root cause and we will provide regular updates till the issue is resolved. Customer applications and/or security services are NOT impacted at this time.
Posted Apr 12, 2022 - 20:10 UTC