Starting around 6:30 AM UTC on March 28th, we encountered a burst of Lambda invocation errors, which caused a backlog in one of our queues, which caused some of the uptime checks to not be recorded in time to avoid alert conditions, which caused bogus alerts to be sent out. The backlog was cleared within a couple of hours.
We're looking at changes we can make to our job queues to prevent this kind of scenario from occurring again.