At 2020 hrs UTC, F5 Operation team noticed issues with log collection across Secure Mesh and App Stack services. This affected customers ability to display or query new logs that were being collected past this time (2020hrs UTC) across our POPs and/or customer sites.
We noticed that amount of logs were significantly increasing in last 12 hours and this increase resulted in memory crash for the logging process. We have since recovered the logging cluster and none of the customer logs were lost or missed. Pipeline synced all the logs from buffers when we fully recovered at 2100hrs UTL.
We root caused that during the maintenance window, our configuration changes led to same logs being ingested twice and this led to increase in log collection. This configuration change has been rectified and the load on cluster has reduced.