Issue Summary
TechnologyOne observed system instability over the weekend in the spatial data layer after the regular storage maintenance by the upstream vendor and took measures to recover via a database failover on 16 February 2025.
On 17 February 2025 a subset of spatial customers in the ANZ region were unable to access ConfigManager, Intramaps and Nearmaps. A portion of this customer base were restored by 2pm AEST with the remaining being brought online by 4pm.
At 7pm AEST errors presented in the logs which identified further error states presenting. Action was taken to restore the affected databases from backup and all were restored by 11pm AEST.
At 11.15pm AEST errors in replication were observed and found to be as a result of replication commencing at time restoration was underway. Access was restored and replications re-run with monitoring into the morning.
Root Cause
The database cluster became instable due to Continuous Availability (CA) configuration not being enabled at the spatial storage layer. This configuration prevents inconsistences for the database during storage maintenance.
Corrective Actions
A failover was conducted on the Sunday evening however many databases continued to enter recovery state.
Corrected a storage permissions issue.
Brought databases online one by one as each time attempted as a group failed.
Restored subset of databases found to be in a corrupted state from back-up.
Preventative Actions
Enable CA on the database share on the primary storage for spatial. A maintenance window is planned between 8pm-9pm AEST on 22/2/2025. Updates will be provided via the status page.