On August 30, 2022 at 6:12am PT, Lattice experienced an outage in its Reviews product that lasted ~40 minutes. At 6:15am PT, Lattice engineering was paged by automatic health monitoring triggered by a large number of GraphQL errors. A merge freeze was initiated on Lattice production and the engineering team began its investigation into the issue. Soon after, the engineering team found the root cause of the errors and began working on a fix.
By 6:29am PT, a fix had been merged into production and the engineering team began monitoring changes. By 6:57am PT, health monitoring was no longer showing errors and Lattice engineering deemed status to be fully operational.
For 40 minutes, customers were not able to use the Reviews product within the Lattice application due to an error on production.
Since faulty code was introduced into production minutes before Lattice engineering was paged via its automatic health monitoring, Lattice engineering was able to determine that this faulty code was the root cause of the GraphQL errors relatively quickly and provide a fix.
The engineering team will be doing a full post-mortem during the week of 9/2/22 investigating improvements to our GraphQL types and how we use resolvers to help prevent issues like this one.