Panel has availability issues
Incident Report for Sendcloud
Postmortem

During the past two days users were not able to rely on the stability of the SendCloud service. In the morning of the 11th of January the platform was not available for a few hours. At the 12th of January we have seen a few moments of bad availability as well.

In the morning of the 11th of January we deployed a new shipment overview for our users. After the release the platform started to suffer from performance issues which eventually stalled the platform. We had to put the platform in maintenance mode to be able to fix the issues in a faster way. After migrating the database to a new and faster database cluster the platform was usable again.

The day after we have seen that the platform had small performance issues as well. Exceptional expensive requests caused a lot of stress on the platform. During the afternoon we have found the cause of this high volume of expensive requests and we’ve taken measurements to restore the performance to a normal level.

Today the platform has returned to a normal state and it is performing well again. We will continue working on ensuring that this will not happen again. Furthermore we have decided to provide more transparency in the performance of the platform so you can see what is happening at SendCloud.

We are sorry for the fact that users were not able to rely on SendCloud. Our entire organization is dedicated to prevent this kind of events in the future. Also we would like to thank you for your understanding and support. This motivates us to keep improving the platform.

Therry van Neerven
CTO SendCloud

Posted Jan 13, 2017 - 10:31 CET

Resolved
This incident has been resolved.
Posted Jan 11, 2017 - 15:54 CET
Monitoring
The services have been restored and customers can ship again.
We are monitoring the situation.
Posted Jan 11, 2017 - 11:16 CET
Update
We've made some big steps in restoring the services. We expect to to remove maintenance mode in the upcoming 15 minutes.
Posted Jan 11, 2017 - 10:59 CET
Identified
We have found which specific components in our platform are not performing well. We expect that it may take about an hour before the platform may be available again.

Since this is a major outage we will write a postmortem once we have restored the services and finished the investigation.
Posted Jan 11, 2017 - 09:33 CET
Update
We have activated maintenance mode. We hope to restore the services soon.
Posted Jan 11, 2017 - 08:09 CET
Investigating
The panel has some availability issues. It may be unavailable, slow or unresponsive. We are working on a solution.
Posted Jan 11, 2017 - 08:03 CET