On Thursday, February 2, 2023, the Central 1 wires system had intermittent outages due to the Money Transfer System (MTS) Backoffice (BO) application intermittently dropping the connection to the database.
The outages caused the wires application not to be available for internal or external users to create or receive wires in PaymentStream Direct for brief periods (<15 min) until MTS BO was restarted.
The outages caused the wires application not to be available for internal or external users to create or receive wires in PaymentStream Direct for brief periods (<15 min) until MTS BO was restarted.
Central 1 service monitoring (Zenoss) started sending MTS BO alerts to PagerDuty at 6:13 a.m. PT (9:15 a.m. ET) and 3 times afterward (9:33, 9:58, 10:23 a.m. PT) reporting exceeded CPU usage. Each time the database disconnected the Payments Software team restarted the MTS BO server and wires in PaymentStream Direct came back online.
After further investigation, the Database Administrator determined the incident was caused by blockages on the database involving the insert and select queries.
To restrict the CPU usage on the server, PS Software made a change to the search query from 180 to 30 days. Additionally, the Database Administrator added indexes to the database.
Making the two changes above resulted in MTS BO server stability.
Developers investigated and could find no evidence of a clear root cause within the code. The reindexing of the database improved performance and removed all locks.
Actions
CHG132384 Increase the total CPU of vahclp01mtsbo from 2 to 4.
Status: Completed
CHG132365 - for monitoring the database. To optimize the handling of blocking queries in code to prevent heap memory exhaustion.
Status: Completed