We'd like to keep Spreaker up 100% of the time. When that doesn't happen, we write about it here.
|Website||So far so good|
|Api||So far so good|
|Streaming||So far so good|
|Mobile apps||So far so good|
Issue started at 01:20 UTC: we’re currently experience a major outage due to networking issues on AWS eu-west-1 region, our major data center. We’re currently working to mitigate the issue. Updates will be posted here.
Update at 02:28 UTC: AWS has confirmed an internal DNS issue and is currently working to fix it. We’re trying to mitigate the issue, skipping DNS resolution as much as possible, but unfortunately we’re unable to mitigate access to AWS services itself (ie. file storage on AWS S3).
Update at 02:56 UTC: we’ve mitigated most of issues, skipping AWS DNS usage on most of our services. Minor outages could still occur on secondary services (ie. RSS Feed synching). We’re working to fix secondary services as well.
Update at 03:05 UTC: AWS continues working through DNS resolution issues in the eu-west-1 region. DNS resolution for public endpoints from the Internet are operating normally, while they are still fixing internal DNS resolution. Spreaker is currently operating normally, thanks both to the on-going resolution on AWS and the mitigation we’ve applied.
Resolved at 03:31 UTC: AWS confirmed the fix and we’ve rolled back our mitigation. The full service should now operate successfully. We’re really sorry for the troubles you experienced due to this outage.
On August 31st, between 23:20 and 23:40 UTC (20 minutes) Spreaker website and API have been down, due to a network outage in the AWS data center affecting our primary database.
We’re very sorry for the service interruption and we’ve worked hard to quickly repristinate it. The down has lasted 20 minutes, while some minor slow downs have been noticed in the following hour, while we’re completing the operations.
Tomorrow, July 13th at 7:00 UTC (9 AM CEST, July 13th at 12 AM Pacific Time), Spreaker will be subject to maintenance for up to 15 minutes to migrate one of our database servers. This is required to avoid downtime or services disruption in the upcoming days.
While our core services will keep working without issues, you might still experience short and temporary disruptions.
UPDATE at 7.30 AM UTC: Maintenance successfully completed
Today, between 05:19 and 05:21 UTC (3 minutes) and between 12:19 and 12:29 UTC (10 minutes) Spreaker website has been down due to a issue with one of our slave databases.
The host running our primary slave database got an issue, causing the database server being unreachable. Despite a down of a slave server should not affect our service, we’ve found an edge case that cause database connections to get stuck and thus Spreaker webservers requests as well.
### We’re currently working to:
Tomorrow, April 27th at 6:00 UTC (8 AM CEST, April 26th at 10 PM Pacific Time), Spreaker will be in maintenance mode for up to 15 minutes to upgrade our database servers.
We’ll both increase the server CPU and SDD drives throughput in order to better handle increasing traffic.
UPDATE at 6.00 AM UTC: Spreaker is going to be under maintenance for up to 15 minutes.
UPDATE at 6.15 AM UTC: maintenance has been completed and Spreaker is now fully working. Maintenance lasted 8 minutes. We’re now monitoring the system and running some secondary operations that will last few more hours, without affecting the service.
UPDATE at 6.47 AM UTC: most of the secondary operations have been completed. We’ll keep going on monitoring system health and performances. No more updates will be published, until the system works as expected. Thank you for your patience.
We’re currently experiencing an huge load on the platform, that’s affecting many services. We’re investigating it and keep you posted here.
UPDATE at 20.25 UTC: Spreaker has received an expected high volume traffic from Egypt. Despite our system is designed to automatically scale and absorb such traffic, it actually lead to a performance issue in our primary database cluster. As a temporary solution, we had to block some of such traffic in order to restore the service in other countries. The service is now operating normally in US and Europe, while still investigating the root cause of the performance issue we got on our databases.
UPDATE at 21:06 UTC: we’ve found the root cause of the performance issue on the primary database cluster and we’re already rolled out an hot patch to avoid the same issue in the near future. We’ll continue our investigation and we’ll likely schedule a maintenance window to upgrade our database servers to faster servers, in order to better absord high-load peaks.
Due to a mistake in the release process, this morning we released a broken version of our Android Radio app (4.0.4), prone to crashes during the application’s startup.
We worked around the clock to fix the issue as soon we noticed it, and we’ve already published an updated version (4.0.5) on the Play Store..
This version is already available for automatic update. If you experienced this issue and your application has not been automatically updated yet, please open the “Play Store” app in your Android device, and visit the “My Apps” section to manually update it.
We’re very sorry about what happened, and we’re already working on improving our continuous integration pipeline in order to avoid similar issues from happening again in the future.
Thanks for your patience.
We’re currently having some networking issues in our primary datacenter run by AWS.
UPDATE at 15:00 UTC: networking issues still ongoing, but should affect a small number of users. We’re monitoring networking connectivity from multiple locations and the issue’s impact is currently reducing over time.
UPDATE at 15:15 UTC: AWS just reported that’s “investigating elevated packet loss between some Internet destinations and the EU-WEST-1 Region”.
UPDATE at 15.36 UTC: An external facility providing some connectivity to the AWS EU-WEST-1 Region has experienced power loss. AWS is currently working with the service provider to mitigate impact and restore power.
UPDATE at 16:23 UTC: AWS recovered power in the impacted facility and is continuing to investigate and resolve intermittent packet loss and latency between some Internet destinations and the EU-WEST-1 Region.
RESOLVED at 16:30 UTC: AWS confirmed the issue has been solved.
Playback currently doesn’t work on latest Firefox when you navigate the Spreaker’s website via HTTPS, due to stronger security policies. We’re working on it and we plan to get it fixed very soon.
In the meantime, we suggest to use a different browser (ie. Google Chrome) or temporarily navigate Spreaker via HTTP.
Thanks for your patience.
UPDATE at 11:00 UTC: the issue has been fixed now. We’re monitoring the infrastructure to ensure everything runs smooth now. We’re really sorry for the inconvenience.
Since yesterday, YouTube sharing is not working. We’re currently fixing it and re-uploading to YouTube all failed videos. This could take some time, due to the huge workload.
Thanks for your patience.
UPDATE at 15:20 UTC: all failed videos have been reprocessed and successfully uploaded to YouTube.