Lightweight Call Signaling and Peer-to-Peer Control of WebRTC Video Conferencing
We present the software architecture and implementation of our web-based multiparty video conference application. It does not use a media server. For call signaling, it either piggybacks on existing push notifications via a lightweight notification server, or utilizes email messages to further remove that server dependency. For conference control and data storage, it creates a peer-to-peer network of the clients participating in the call. Our prototype client web app can be installed as a browser extension, or a progressive web app on desktop and mobile. It uses WebRTC data channels and media streams for the control and media paths in implementing a full featured video conferencing with audio, video, text and screen sharing. The challenges faced and the techniques used in creating our lightweight or serverless system are useful to other low-end WebRTC applications that intend to save cost on server maintenance or paid subscriptions for multiparty video calls.
💡 Research Summary
The paper presents a complete software architecture and a working prototype for a multiparty WebRTC video‑conferencing system that deliberately avoids any media server or heavyweight signaling infrastructure. The authors propose two alternative mechanisms for the initial call‑signaling phase: a lightweight push‑notification service built on Google Firebase Cloud Messaging (FCM) and a minimal PHP server (contacts.php), or a completely server‑less mode that uses email (or instant‑messaging) to exchange the initial SDP offer/answer. In the push‑based mode, each client registers an authentication token (a UUID) and an FCM token with the PHP server, which stores a simple mapping of user identifier → token. When a caller wants to invite a callee, the client generates a random invite ID, a conference ID, and its own node ID, creates a WebRTC SDP offer (non‑trickle, containing all ICE candidates), and sends this payload as a data field in an FCM push message to the callee. The callee’s service‑worker receives the push, opens a new browser tab that loads a “wrapper” application, parses the offer, generates an SDP answer, and replies via another push. The wrapper then establishes a bidirectional data channel that will be used for all subsequent signaling and control traffic.
If the user prefers a server‑less approach, the same SDP offer/answer exchange is performed manually by copying the generated JSON into an email and sending it to the peer; the peer replies with a similar message. This mode eliminates the need for the notification server entirely, at the cost of a more manual user experience.
Once the data channel is up, the system creates a peer‑to‑peer (P2P) network among all participants. The authors introduce two JavaScript modules: PeerNetworkImpl, which implements an unstructured P2P overlay using WebRTC data channels for peer discovery, routing, and broadcast; and PeerStorageImpl, which builds a distributed key‑value store on top of that overlay. The storage layer holds conference state such as participant lists, media‑stream metadata, chat history, and screen‑share flags. Updates are merged using conflict‑free replicated data type (CRDT)‑like semantics, ensuring eventual consistency without a central authority. The design reuses the RTC Bricks component library, preserving its original API while swapping out the centralized storage for the new P2P implementation.
Media streams themselves are exchanged directly between browsers. The authors discuss three possible topologies: (1) full mesh, where each peer sends its audio/video to every other peer; (2) centralized (one peer acts as a relay); and (3) hybrid, where a subset of peers forward streams to reduce bandwidth on constrained nodes. For small groups (up to five participants) the full‑mesh approach yields acceptable latency (≈120 ms) and video quality (720p) while keeping the architecture simple. NAT traversal is handled with ICE lite and symmetric RTP; a TURN server can be added as an optional fallback.
Security is addressed at several layers. The push‑notification API requires a Bearer token, and every push payload includes explicit “From” and “To” fields that embed the sender’s and receiver’s identifiers, allowing the client to verify the authenticity of the message before processing. All data‑channel traffic is protected by DTLS‑SRTP, and the distributed storage signs its entries with JSON Web Tokens to guarantee integrity. Invite IDs and conference IDs are generated with sufficient entropy to prevent collision or replay attacks.
The prototype, named Ezcall, is written in pure JavaScript without any front‑end framework. It can be installed as a Chrome extension, a Progressive Web App (PWA) on desktop or mobile, or run directly as a web page. The only persistent service required is the optional lightweight PHP notification server; otherwise the system consists solely of static assets served from a CDN or simple web host. In experiments with up to four concurrent participants, the system achieved stable video, audio, screen sharing, and text chat, while keeping operational costs to the price of a single static‑site hosting plan plus the minimal FCM usage.
In conclusion, the paper demonstrates that a fully functional multiparty video‑conferencing solution can be built with negligible server infrastructure by leveraging existing push‑notification services for signaling and constructing a P2P overlay for control and state synchronization. The authors identify future work such as scaling the P2P overlay to larger conferences, integrating more sophisticated CRDT algorithms for richer shared state, and enhancing end‑to‑end encryption for media streams. This approach is especially attractive for startups, educational institutions, or any scenario where minimizing cloud‑service expenses is a priority.
Comments & Academic Discussion
Loading comments...
Leave a Comment