What a joy! It was a breath of fresh air when I came across the requirement of implementing live streaming.

It was quite a learning curve to understand how this whole thing works.

First things first, WebRTC = Web Real Time Communication. It is standardized by W3C. Implementation of APIs supported across web browsers and mobile clients.

What’s so special about this? D Its direct client-to-client commmunication (Well in most cases)

So how does it work?

First thing you need is a signalling server. You say, oh hey we talking about servers already when I mentioned client-to-client communication? Well well, that’s how WebRTC rolls! Signalling server is to exchange media metadata before a call can take place. Follows something called the SDP (Session Description Protocol) The signalling server’s process is not defined under the WebRTC standard, the implementation for the same is left upto the application itself.

What’s next you ask? If the negotiation of media format is successful? We can hop on to the main business. The whole direct client to client thing.

For this to happen both clients have to know each other’s public IP address.

When you google “what is my IP”, what is happening is that a STUN server is telling you what your IP is. STUN’s full form is Session Traversal Utilities for NAT. Its job is to read from an incoming packet what the public IP of the sender is, and then send it right back.

Then once both the clients know each other’s public IP, WebRTC’s magical APIs can do their thing and voila you got live streaming!

But no wait. This will work only about 85% of the time. (Ref: https://www.twilio.com/stun-turn)

The other 15% of the time are the cases when one of the clients is behind something called symmetric NAT. In symmetric NAT, each request has a unique set of external source IP and port number. This causes WebRTC direct connection attempts to fail.

The workaround to this is using a TURN server. TURN is short for, Traversal Using Relays around NAT. What happens when TURN is used is that all of the data/media is relayed through a server to the other client and vice versa.

Options for TURN,

  1. Use a service like Twilio’s which offer TURN services.
  2. Deploy your own. I’ve tried many of them and I had the most seamless experience with deploying CoTURN. Ref: https://www.genymotion.com/help/cloud-paas/tutorial/configure-turn-server/

Note: All TURN servers by default serve as STUN server as well.

How is all of these things mechanisms like STUN, TURN handled? Through something called the ICE framework. ICE frameworks abstracts away the complexities of networking. You must pass ICE server URLs to WebRTC APIs for the ICE framework to work. These server URLs represent nothing but STUN and TURN servers.

Production environment should STUN first, then UDP-TURN, then TCP-TURN. The order in which ICE candidates are declared does matter. If the first fails, the second one is tried and so on.