Documents omitted from some indexes
Incident Report for Fauna
Postmortem

On June 1 at 14:50 UTC a customer opened a support case to notify us that specific documents were not present in an index where they were expected. Our customer support team quickly established that the documents were still present in the database and directly accessible using a reference, but confirmed that the documents were omitted from the index. The support on-call engaged our engineering on-call for the database team to investigate. The engineering on-call suspected that our garbage collector, which cleans up documents that have been deleted or have outlived their configured time-to-live (TTL), might be causing the issue and disabled garbage collection at 15:54 as a precaution. Two more customer reports of documents missing from indexes came in via additional support cases, and additional engineers were brought in to investigate each report. At 17:20 the engineering team identified that the issue was caused by a code defect that caused the garbage collector to write a partial history for some documents with a large number of versions, which in turn caused the indexing system to miss adds/deletes and incorrectly include/exclude those documents from indexes. At 18:08 the engineering team initiated a repair of documents that were known to be impacted and the repair completed on June 2 at 4:01.

We know that data inconsistencies are unacceptable and we are prioritizing work to improve. Specifically, we’re taking the following steps:

  • Improving the coverage of our testing of the garbage collector process.
  • Augmenting our Jepsen tests to include a scenario where the garbage collector runs over documents with large histories that are in indexes.
  • Creating a new watchdog that will proactively validate document and index entries.

We prioritize the availability, security, performance, and correctness of our service above everything else and apologize for any inconvenience that this event may have caused you. If you have further questions/comments about the event or require assistance with any remaining issues related to the event, please reach out to support@fauna.com.

Posted Jun 14, 2021 - 15:28 PDT

Resolved
We have resolved the issue that was causing some documents to be omitted from indexes. The service is now operating normally.
Posted Jun 01, 2021 - 21:05 PDT
Identified
We have identified the issue that is causing documents to be omitted from some indexes and are in the process of repairing impacted indexes.
Posted Jun 01, 2021 - 10:36 PDT
Investigating
We are investigating an issue that is causing documents to be omitted from some indexes.
Posted Jun 01, 2021 - 09:07 PDT
This incident affected: Global Region Group (FQL API).