On Jun 17, 2021 at 5:00 UTC we started a scheduled database maintenance. As we restored services, our engineers noticed an extra load to our main database cluster which caused us to call an incident and extend the maintenance window.
What happened?
After we restored services due to the database maintenance, we identified an issue with an extra load on our main database cluster. Due to that, the Hotjar App and data ingestion were offline while we worked to stabilize the issue. Data tracking was offline from 5AM UTC up to 12:20 UTC.
Why did this issue occur?
After we tried to restore services after maintenance mode, we had some issues with the data processing on our main database.
What will we do to prevent this from happening in the future?
We added more performance-related tests to our database migrations procedure and cleaned up data for discontinued features.