Largescale incident with a number of services inaccessible
Incident Report for Altmetric
Resolved
We are pleased to report that the Major Incident is now closed, our Major Incident Team has disbanded and all services are returned to normal operation. If you see anything that you believe to be incorrect, please contact your support team.

Our teams shall be working to review the Major Incident for opportunities to improve our resilience and disaster recovery and we would like to take this opportunity to thank our customers for their patience throughout.

There will be no further updates to this incident.
Posted Apr 01, 2021 - 14:53 UTC
Update
We are continuing to finalise the resolution of our last outstanding piece of processing. This relates to a very small number of social mentions with negligible customer impact however we will continue until fully resolved.

Our next update will be before 1700 UTC Thursday, 1st April.
Posted Mar 31, 2021 - 15:20 UTC
Update
After the bulk processing was completed, we identified an edge case effecting a small number of social mentions. We are working on a manual fix for this after which we hope to close the incident.

Our next update will be before 1700 UTC Wednesday, 31st March.
Posted Mar 30, 2021 - 16:04 UTC
Update
We have now completed our bulk processing of the missed attention that occurred as a result of the major incident earlier this month.

Tomorrow we will be performing our final checks to ensure everything is restored.

Our next update will be before 1700 UTC Tuesday, 30th March.
Posted Mar 29, 2021 - 16:17 UTC
Update
We are now in the final stages of processing the missed attention and we expect this to complete early next week.

Our next update will be before 1700 UTC Monday, 29th March.
Posted Mar 29, 2021 - 09:52 UTC
Update
We are now in the final stages of processing the missed attention and we expect this to complete early next week.

Our next update will be before 1000 UTC Monday, 29th March.
Posted Mar 26, 2021 - 16:46 UTC
Update
Our team are working to complete the processing of missed attention and this is expected to continue later into the week.

Our next update will be before 1700 UTC Friday, 26th March.
Posted Mar 26, 2021 - 10:03 UTC
Update
All customer facing services are fully operational. Our team are working to complete the processing of missed attention and this is expected to continue later into the week.

Our next update will be before 1000 UTC Friday, 26th March.
Posted Mar 25, 2021 - 10:34 UTC
Update
We are continuing to process our backlog of mentions and we expect this to continue until later into the week.

Our next update will be before 1030 UTC Thursday, 25th March.
Posted Mar 24, 2021 - 16:46 UTC
Update
We are continuing to process our backlog of mentions and we expect this to continue until later into the week.

Our next update will be before 1700 UTC Wednesday, 24th March.
Posted Mar 24, 2021 - 09:58 UTC
Update
We are continuing to process our backlog of mentions, and so are continuously adding mentions to our badges, details pages and API. 

Unfortunately the rate at which we are processing is slower than we expected and proving harder to predict. This means our completion date for processing the backlog is likely to be pushed out into later this week.

Our next update will be before 1000 UTC Wednesday, 24th March.
Posted Mar 23, 2021 - 18:06 UTC
Update
We are continuing to process our backlog of mentions. We aim to provide a revised estimate for completion later today.

Our next update will be before 1700 UTC Tuesday, 23rd March.
Posted Mar 23, 2021 - 10:01 UTC
Update
We are continuing to process our backlog of mentions. We aim to provide a revised estimate for completion tomorrow.

Our next update will be before 1000 UTC Tuesday, 23rd March.
Posted Mar 22, 2021 - 16:55 UTC
Update
The backlog of reprocessing continues and in order to provide the best service to our customers, we are prioritising the processing of recent mentions. In addition, we're taking a very cautions approach to the speed of reprocessing in order that customer performance across our services remains consistent.

Our next update will be before 1700 UTC today, Monday, 22nd March.
Posted Mar 22, 2021 - 09:57 UTC
Update
We are continuing to process our backlog of mentions and we expect this to continue over the weekend.

Our next update will be before 1000 UTC Monday, 22nd March.
Posted Mar 19, 2021 - 16:35 UTC
Update
We are continuing to process our backlog of mentions and we expect this to continue over the weekend.

Our next update will be before 1700 UTC today, Friday 19th March.
Posted Mar 19, 2021 - 10:01 UTC
Update
We are continuing to process our backlog of mentions and we expect this to continue over the weekend.

Our next update will be before 1000 UTC tomorrow, Friday 19th March.
Posted Mar 18, 2021 - 16:45 UTC
Update
The details pages, explorer and API's are all synchronised. We have started the process to correctly attribute the remaining mentions and processing new and queued mentions.
Our next update will be before 1700 UTC today, Thursday 18th March.
Posted Mar 18, 2021 - 09:47 UTC
Update
We have identified and removed 97% of misattributed mentions caused by our recent outage. This is now reflected in the Detail Pages and API. The Explorer will need some more time to process the changes, and we expect this to be available before the end of today.

Over the course of the next week, we will be focusing on correctly attributing the remaining mentions to more recent research outputs, re-enabling the processing of new and queued mentions and taking steps to improve resilience.

Our next update will be before 10am tomorrow.
Posted Mar 17, 2021 - 17:58 UTC
Update
Our teams continue to work on the incident, we continue to progress our restoration plan and and our next update will be before 1800 UTC today, Tuesday 17th March.
Posted Mar 17, 2021 - 09:53 UTC
Update
In order to provide our customers with as much information as possible, we have created a blog post which holds some additional information for our customers.

https://www.altmetric.com/blog/customer-update-altmetric-technical-major-incident/

Our teams continue to work on the incident and our next update will be before 1000 UTC tomorrow, Tuesday 17th March.
Posted Mar 16, 2021 - 16:01 UTC
Update
Our Counts and Commercial API, including badges and details pages are now fully restored.

Our work to resolve the issue concerning several mis-attributed mentions is ongoing. We will provide another update before 6pm.
Posted Mar 16, 2021 - 12:40 UTC
Update
We are continuing to work on a fix for this issue.
Posted Mar 16, 2021 - 10:38 UTC
Update
Our data centre provider has restored functionality to our hardware infrastructure and we are currently working to re-synchronise databases and bring services back online. We expect our services to start restoring within the next hour. Our next update will be at 12pm.
Posted Mar 16, 2021 - 10:01 UTC
Update
Unfortunately, one of our data centres is suffering from an outage which has taken down our Counts and Commercial API, including badges, and details pages. We do not have an ETA for a fix on this, but we are investigating work arounds. Our next update will be at 10am.
Posted Mar 16, 2021 - 08:11 UTC
Update
We have identified the root cause of several mis-attributed mentions which we are in the process of resolving. This will require us to restore mentions data to a point in time (9th March) and to review all mentions processed since then to ensure they're attributed correctly. Our next update will be before 1700 UTC tomorrow, Tuesday 16th March.
Posted Mar 15, 2021 - 16:58 UTC
Identified
Following our response to the Major Incident declared by our hosting provider last week, we have identified several mis-attributed mentions which we are in the process of resolving. As a further measure, we are planning to review all mentions processed since the Major Incident and ensure they're attributed correctly. Our next update will be before 1700 UTC today, Monday 15th March.
Posted Mar 15, 2021 - 14:16 UTC
Update
Following a stable weekend, the Major Incident Team are regrouping this morning to assess outstanding actions with a view to resolving the incident and completing the post incident report. Our next update will be before 1700 UTC today, Monday 15th March.
Posted Mar 15, 2021 - 09:38 UTC
Monitoring
All Altmetric services have been restored and queues restarted. System resilience has been restored and data is being synchronised across our datacentres. The incident shall remain open at a reduced severity while we monitor performance and backup processes over the weekend. We continue to see no direct customer impact in accessing our services.
We will update again by Monday 15th March at 10am.
Posted Mar 12, 2021 - 21:04 UTC
Update
We are continuing to work on a fix for this issue.
Posted Mar 12, 2021 - 12:18 UTC
Update
We are continuing to work on a fix for this issue.
Posted Mar 11, 2021 - 22:27 UTC
Update
Our teams are processing the backlog of missed mentions and related data, and we are working across our infrastructure teams to bolster capacity and resilience for the weekend. This incident will remain in a monitoring state until we are confident that we have fully addressed all systems affected by the outage and returned them to their previous state. We will update again by Monday 15th March at 10am.
Posted Mar 11, 2021 - 16:23 UTC
Update
The stability of our critical systems is holding steady, and we are making our way through the backlog of missed mentions from the last 24 hours. You may still experience some degradation in performance within the Explorer, but this should continue to improve over the coming days. Our next update will be by 5pm today.
Posted Mar 11, 2021 - 09:10 UTC
Update
We have successfully stabilized all of our critical systems, and improved our capacity with temporary servers to mitigate performance issues while we fully recover and reprovision our lost infrastructure. We are receiving new mentions, but there will be a delay in these being visible in our platform as we work through the backlog and pull in mentions that were missed during the outage. We will update this page again by 9am tomorrow morning.
Posted Mar 10, 2021 - 16:05 UTC
Update
We have successfully restored all of our customer facing services, though we continue to operate with reduced server capacity so some services may be slower than normal. There is still a delay to populating the system with new mentions, so you may not see the mentions you expect attached to research outputs. Now that our core services are restored, we will begin working to retrieve any missed mentions, and get the pipeline of new mentions back to a stable state. Our next update will be at 5pm.
Posted Mar 10, 2021 - 13:43 UTC
Update
Our team has begun bringing services back online. Our database capacity is still reduced, so we will be monitoring our server capacity carefully and this will lead to a delay in mentions being shown in the Explorer. Our next update will be by 2pm.
Posted Mar 10, 2021 - 11:57 UTC
Update
Our team is now working through a plan to restore necessary infrastructure, and we will continue to update as we restore our service capacity back to normal levels.

Our next update will be at 12pm.
Posted Mar 10, 2021 - 10:32 UTC
Identified
Good morning. The team has returned to assess the changes we need to make to get back into a healthier position. Overnight on March 10th 2021, our data centre provider suffered a major incident, during which one of the data centres was destroyed. This leaves us with less capacity than normal to provide our services with some noticeable impact.

You may notice lack of availability or degradation of the Explorer, Details Pages and Badges, and some Explorer API endpoints are not responding. The Details Page API should be unaffected, but will be up to date only from 1.30am. New mentions are not processing at the moment (including removing mentions which should be removed).

Our team is now focused on restoring key services, and then reclaiming missing data from our sources. We are not currently expecting any permanent data loss, however we will update this page immediately if that changes.

Our next update to this page will be at 11am.
Posted Mar 10, 2021 - 08:59 UTC
Monitoring
We've been made aware of a fire in one of our datacentres. This means that we're not running as many instances of our software as we would like. This could present itself as some services being slow, as usage is not spread across as many machines.

Our main data processing pipeline is also offline at the moment, so new mentions will take some time to appear on details pages and in the Explorer. As it's 3am here in England, and our services are largely stable, work to return data processing will be continued in the morning. We are hopefull that all data during this downtime can be recovered.
Posted Mar 10, 2021 - 03:30 UTC
Update
An issue was spotted with our API and its service has been returned.
Posted Mar 10, 2021 - 02:46 UTC
Identified
There's an outage on our provides side which has caused disruption to one of our datacentres.

We're currently able to provide services only from one of our datacentres, which will cause disruption in some places.
Posted Mar 10, 2021 - 02:30 UTC
Investigating
We're currently attempting to track down the reason behind a number of our services being unresponsive. Signs at the moment point towards a data centre having issues.
Posted Mar 10, 2021 - 01:42 UTC
This incident affected: Altmetric API (Commercial API, Free API) and Altmetric Details Pages, Altmetric Explorer for Institutions, Altmetric Explorer for Publishers, altmetric.com website, Data Processing.