We'd like to keep Spreaker up 100% of the time. When that doesn't happen, we write about it here.
|Website||So far so good|
|Api||So far so good|
|Streaming||So far so good|
|Mobile apps||So far so good|
From 18:40 till 19:05 UTC we got a failure on 1 web server: during this timeframe, all requests on that server failed with 503 error. The issue is now fixed.
From 01:33 till 09:04 UTC we got a failure on some European servers hosted in Topix, Italy. European users may have experienced a connection drop, but they have been forwarded to other locations in few minutes, thanks to our high-available and fault-tolerant infrastructure.
Our service provider AWS has some serious networking issues on the CDN service (CloudFront) and you may experience some issues while using Spreaker. This outage is affecting Spreaker and other thousands web applications around the globe. We’re really sorry for the inconvenience and we hope AWS will fix it soon.
UPDATE at 01:00 UTC: AWS is currently investigating increased error rates for DNS queries for CloudFront distributions.
UPDATE at 01:38 UTC: the service is gradually restoring. You should now be able to access most Spreaker services (including website), yet some random failures could still occur.
We’re experiencing intermittent networking issues on Spreaker. We’re alerted Amazon Web Services and we hope they will fix the issue soon.
UPDATE at 15:26 UTC: AWS is investigating Internet provider connectivity issues in the EU-WEST-1 Region.
FIXED at 15:58 UTC: according to AWS, “we experienced impaired Internet connectivity affecting some instances in the EU-WEST-1 Region. The issue has been resolved and the service is operating normally.”
Due to a severe issue with our emailing system, we sent out blank emails in the last hours. We’ve just pushed a fix to production and we also introduced more checks in order to avoid such issue in the future.
We’re really sorry for the inconvenience.
Spreaker disabled #SSLv3 protocol support in response to vulnerability published today. Update your browser if you’ve any issue navigating Spreaker via HTTPS.
We just resolved an issue with our Tube service that affected many US customers. We’re really sorry for the inconvenience and we’re working hard to avoid the same issue in the future.
Few months ago we switched our traffic to a new high-available and fault-tolerant recording infrastructure. Once of the core components of this infrastructure is the so called “balancer”. The balancer is the entry point for each recording connection and routes the traffic to the nearest available recording server.
Unfortunately, last night one of these balancers stopped to work (after 91 days of uptime) in a bad way, so that the health check was OK, but balancer was unable to route requests.
We’re currently working to investigate the root cause of the issue and detect such condition during the health checks, so that if it happens again in the future we’ll be able to automatically recover it.
Sorry again for the inconveniente. If you have any further question, don’t hesitate to contact us at http://help.spreaker.com
UPDATE (Oct, 3rd)
We investigated the issue and it looks was a bug in a client library we use to connect to Redis. We just pushed a new balancer version to production, that includes two changes:
Issue at 17:30 UTC: We’re currently experiencing some networking issues while publishing your on-demand tracks at the end of your live broadcast. No worry! Your tracks are not lost: it will just take more time to get published.
We’re working to fix it as soon as possible. New recording connections will not be affected by the issue (we currently disabled affected servers).
Resolved at 21:00 UTC: The network issue is now fixed and all on-demand tracks have been published. We’re really sorry for the inconvenience.
We know how important reliability is to you, and so in these past weeks we worked to provide you a high-available and fault-tolerant recording infrastructure.
We’re progressively rolling out this new infrastructure to all users. Currently all PRO users that broadcast with 3rd party applications are routed to this new infrastructure; in the next weeks we’ll open it up to all users and apps.
In this post, we’d love to share some tech details about it with you, in order to show you how it works and how we handle interruptions.
How it works
The image below shows the big picture.
When an application starts live broadcasting, it connects to icecast.spreaker.com. This DNS entry is resolved to the load-balancer closest to you (latency-based routing), and then the connection is routed to an available server inside that datacenter.
This design guarantees that:
Spreaker Recording’s infrastructure is currently deployed in 3 datacenters: Europe (Ireland), US East (Virginia), US West (Oregon).
What if the connection between the client and the balancer drops?
If the connection between the client and the load balancer drops, the client will automatically retry to connect to icecast.spreaker.com. Once the connection is re-established, the balancer will route the connection to the same exact server where the client was connected before, so that it can continue to broadcast.
What if a balancer is down?
The DNS icecast.spreaker.com is managed by AWS Route 53. It constantly checks the health status of each balancer and, if a balancer is down, it temporarily removes the affected balancer from the pool of available ones.
So, when a balancer goes down:
What if a server is down?
The infrastructure constantly monitors the health of each server. When a server is down, it’s temporarily removed from the pool of available servers. The balancer will route new requests (or reconnection requests) to other available servers in the same datacenter.
The worst case scenario is when all servers in a datacenter are down. In this case, the balancer will route new requests (or reconnection requests) to available servers in other datacenters.
We’re experiencing some networking issues between two datacenters. Some of you could be temporarily unable to broadcast or once your live broadcast ends, the recorded track could take more time than usual to get ready. We’re working to fix it as soon as possible.