We'd like to keep Spreaker up 100% of the time. When that doesn't happen, we write about it here.

Website So far so good
Api So far so good
Streaming So far so good
Mobile apps So far so good

08 July 2014

10:43 CEST

High-Available and Fault-Tolerant Recording Infrastructure

We know how important reliability is to you, and so in these past weeks we worked to provide you a high-available and fault-tolerant recording infrastructure.

We’re progressively rolling out this new infrastructure to all users. Currently all PRO users that broadcast with 3rd party applications are routed to this new infrastructure; in the next weeks we’ll open it up to all users and apps.

In this post, we’d love to share some tech details about it with you, in order to show you how it works and how we handle interruptions.

How it works

The image below shows the big picture.

image

When an application starts live broadcasting, it connects to icecast.spreaker.com. This DNS entry is resolved to the load-balancer closest to you (latency-based routing), and then the connection is routed to an available server inside that datacenter.

This design guarantees that:

  1. You connect to the nearest datacenter.
  2. Each datacenter has multiple servers, and your connection is routed to an available server. A server is considered available when it passes all health checks and it still has some capacity.

Spreaker Recording’s infrastructure is currently deployed in 3 datacenters: Europe (Ireland), US East (Virginia), US West (Oregon).

What if the connection between the client and the balancer drops?

If the connection between the client and the load balancer drops, the client will automatically retry to connect to icecast.spreaker.com. Once the connection is re-established, the balancer will route the connection to the same exact server where the client was connected before, so that it can continue to broadcast.

image

What if a balancer is down?

The DNS icecast.spreaker.com is managed by AWS Route 53. It constantly checks the health status of each balancer and, if a balancer is down, it temporarily removes the affected balancer from the pool of available ones.

image

So, when a balancer goes down:

  1. The connection between the client and the balancer drops.
  2. Route 53 removes the balancer from the pool.
  3. The client attempts to reconnect to icecast.spreaker.com - since the DNS TTL is low, the client will resolve the DNS again and will be routed to another balancer (because the affected one has been removed from the pool). The client automatically attempts reconnection multiple times, and so to smoothly handle the case, the DNS entry will not be updated yet at the time of the first reconnection .
  4. The new balancer will route the connection to the same exact server where the client was connected before.

image

What if a server is down?

The infrastructure constantly monitors the health of each server. When a server is down, it’s temporarily removed from the pool of available servers. The balancer will route new requests (or reconnection requests) to other available servers in the same datacenter.

image

The worst case scenario is when all servers in a datacenter are down. In this case, the balancer will route new requests (or reconnection requests) to available servers in other datacenters.

image

30 June 2014

13:34 CEST

Networking issues - Broadcasting in Europe affected

We’re experiencing some networking issues between two datacenters. Some of you could be temporarily unable to broadcast or once your live broadcast ends, the recorded track could take more time than usual to get ready. We’re working to fix it as soon as possible.

07 June 2014

04:00 CEST

Networking issues

Issue opened at 1:40 UTC

We’re experiencing networking issues. The issue The issue looks related internal our provider (Amazon Web Services - also confirmed by many other customers) and we’ve already alerted them. We’re waiting for a fix.

We apologize for any inconvenience.

Update at 2.10 UTC

The issue has been confirmed by AWS and engineers are working on that.

image

As a side node, the chart below looks how the networking issues affected Spreaker users. Looks that **about 30% **of users are affected.

image

Update at 2.15 UTC

Many users are reporting that networking issues have been fixed now, but still didn’t receive any official bulletin from AWS.

Resolved at 2.28 UTC

AWS confirms the issue has been fixed.

19 May 2014

22:42 CEST

Networking issues from/to US

We’re currently experiencing some networking issues from / to US. Spreaker web servers are currently hosted at AWS (Amazon Web Services) data centers in Europe, and there’re some networking failures between AWS Europe and some US nodes.

We’re really sorry for the inconvenience.


UPDATE at 21.12 UTC

Networking issues are caused by a truncated Trans-Atlantic link. Telia advised they’re working to resolve a major network issue.

Despite other cloud providers, AWS didn’t disclosure any action yet to route the network traffic through other network providers, so some of you may still experience networking issues.


UPDATE at 21.53 UTC

The following map from Akamai shows the affected area.

image


UPDATE at 22.06 UTC

According to DigitalOcean, Telia has repaired their issue.

08 May 2014

13:50 CEST

Issue: high failure rate

We’re experiencing an high failure rate on our API. We’re investing it. More updates will be published here.

UPDATE: the issue was caused by an high load on our RabbitMQ servers (we currently have 2 masters). This caused a chain effect that led to a temporary service failure in our API. Since most of our applications (both web and mobile) are based upon our API, most of you were unable to use Spreaker. We’re really sorry for that and we’ve already planned some improvements for the next week, in order to avoid such issue again.

23 April 2014

09:30 CEST

Spreaker API: removed XML/JSONP support

Dear developers,

as you may know, 2 years ago we deprecated XML and JSONP support, keeping JSON as the only officially supported format.

In the last 30 days we finally didn’t get any request with XML and JSONP response formats, so today we completely removed its support. You should notice no difference, since all of you are now using JSON.

Thanks for your help to make it happen!

02:34 CEST

Notes about Spreaker down

Spreaker website and API have been unreachable for 40 minutes, from 22:19 UTC to 22:58 UTC, due to networking issues in Amazon EU-west-1 datacenter where we currently run most of our servers.

As soon as the issue occurred, we immediately started to migrate all affected servers to new ones, in order to quickly recover the service.

At the time of writing, Amazon is still investigating the issue (see screenshot below), but all remaining Spreaker servers currently look unaffected.

image

We’re really sorry for the trouble.

01:05 CEST

UPDATE: Spreaker is back

Spreaker web site and API are now online. We’re still checking all services, then we’ll start investigate the issue and will post more details.

00:59 CEST

UPDATE: Spreaker still down

The failure affected more servers then initially detected. We’re working to bring everything back asap.

00:23 CEST

Spreaker is currently unavailable

We’ve detected an issue on one of our primary databases and Spreaker is currently unavailable. We’re working to bring it back asap.

Looking for help?

If you need any assistance, please contact us via our customer support service or drop us an email.