Lifecycle Stage | Maturity | Status | Latest Revision |
---|---|---|---|
2A | Candidate Recommendation | Active | r0, 2023-04-12 |
Authors: [@mxinden]
libp2p transport protocol enabling two private nodes (e.g. two browsers) to establish a direct connection.
Browser A wants to connect to Browser node B with the help of server node R.
Both A and B cannot listen for incoming connections due to running in a constrained environment (i.e. a browser) with its only transport capability being the W3C WebRTC RTCPeerConnection
API and being behind a NAT and/or firewall.
Note that A and/or B may as well be non-browser nodes behind NATs and/or firewalls.
However, for two non-browser nodes using TCP or QUIC hole punching with DCUtR will be the more efficient way to establish a direct connection.
On a historical note, this specification replaces the existing libp2p WebRTC star and libp2p WebRTC direct protocols.
-
B advertises support for the WebRTC browser-to-browser protocol by appending
/webrtc
to its relayed multiaddr, meaning it takes the form of<relayed-multiaddr>/webrtc/p2p/<b-peer-id>
. -
Upon discovery of B's multiaddress, A learns that B supports the WebRTC transport and knows how to establish a relayed connection to B to run the
/webrtc-signaling/0.0.1
protocol on top. -
A establishes a relayed connection to B. Note that further steps depend on the relayed connection to be authenticated, i.e. that data sent on the relayed connection can be trusted.
-
A (outbound side of relayed connection) creates an
RTCPeerConnection
provided by a W3C compliant WebRTC implementation (e.g. a browser). A creates a datachannel viaRTCPeerConnection.createDataChannel
with the labelinit
. This channel is required to ensure that ICE information is shared in the SDP offer. See STUN section on what STUN servers to configure at creation time. A creates an SDP offer viaRTCPeerConnection.createOffer()
. A initiates the signaling protocol to B via the relayed connection from (1), see Signaling Protocol and sends the offer to B. Note that A being the initiator of the stream is merely a convention preventing both nodes to simultaneously initiate a new connection thus potentially resulting in two WebRTC connections. A MUST as well be able to handle an incoming signaling protocol stream to support the case where B initiates the signaling process. -
On reception of the incoming stream, B (inbound side of relayed connection) creates an
RTCPeerConnection
. Again see STUN section on what STUN servers to configure at creation time. B receives A's offer sent in (2) via the signaling protocol stream and provides the offer to itsRTCPeerConnection
viaRTCPeerConnection.setRemoteDescription
. B then creates an answer viaRTCPeerConnection.createAnswer
and sends it to A via the existing signaling protocol stream (see Signaling Protocol). -
A receives B's answer via the signaling protocol stream and sets it locally via
RTCPeerConnection.setRemoteDescription
. -
A and B send their local ICE candidates via the existing signaling protocol stream to enable trickle ICE. Both nodes continuously read from the stream, adding incoming remote candidates via
RTCPeerConnection.addIceCandidate()
. -
On successful establishment of the direct connection, A closes the
init
data channel created in step 4, B and A close the signaling protocol stream. On failure B and A reset the signaling protocol stream.Behavior for transferring data on a relayed connection, in the case where the direct connection failed, is out of scope for this specification and dependent on the application.
-
Messages on
RTCDataChannel
s on the establishedRTCPeerConnection
are framed using the message framing mechanism described in multiplexing.
sequenceDiagram
participant a as Browser A
participant cr as CircuitRelayV2Peer
participant b as Browser B
participant stun as STUN Server
b->>cr: Establish Relayed Connection (WebTransport, WebRTC)
b-->>a: Shares its own relayed webrtc multiaddress (out of band)
a->>b: Establishes a relayed connection to Browser 2
a-->>a: Creates RTCPeerConnection with STUN server config, init DataChannel and SDP offer
a->>b: Initiates libp2p /webrtc-signaling/0.0.1 protocol stream over relayed conection and sends SDP
b-->>b: Creates RTCPeerConnection with STUN server config, sets Browser1's SDP offer, and creates SDP answer
b->>a: Sends SDP answer over signaling stream
a-->>a: Set SDP answer with RTCPeerConnection.setRemoteDescription
a->>+stun: What's my public IP and port
stun->>-a: Browser A observed ip and port: 8.8.8.1:63333
b->>+stun: What's my public IP and port
stun->>-b: Browser B observed ip and port: 6.6.6.1:52222
b->a: exchange ICE candidates over signalling stream pass to RTCPeerConnection.addIceCandidate()
b->a: Establish direct connection
A node needs to discover its public IP and port, which is forwarded to the remote node in order to connect to the local node. On non-browser libp2p nodes doing a hole punch with TCP or QUIC, the libp2p node discovers its public address via the identify protocol. One cannot use the identify protocol on browser nodes to discover ones public IP and port given that the browser uses a new port for each connection. For example say that the local browser node establishes a WebRTC connection C1 via browser-to-server to a server node and runs the identify protocol. The returned observed public port P1 will most likely (depending on the NAT) be a different port than the port observed on another connection C2. The only browser supported mechanism to discover ones public IP and port for a given WebRTC connection is the non-libp2p protocol STUN. This is why this specification depends on STUN, and thus the availability of one or more STUN servers for A and B to discovery their public addresses.
Implementations MAY use one of the publicly available STUN servers, or deploy a dedicated server for a given libp2p network. Further specification of the usage of STUN is out of scope for this specifitcation.
It is not necessary for A and B to use the same STUN server when establishing a WebRTC connection.
The protocol id is /webrtc-signaling/0.0.1
.
Messages are sent prefixed with the message length in bytes, encoded as an unsigned variable length integer as defined by the multiformats unsigned-varint spec.
syntax = "proto3";
message Message {
// Specifies type in `data` field.
enum Type {
// String of `RTCSessionDescription.sdp`
SDP_OFFER = 0;
// String of `RTCSessionDescription.sdp`
SDP_ANSWER = 1;
// String of `RTCIceCandidate.toJSON()`
ICE_CANDIDATE = 2;
}
optional Type type = 1;
optional string data = 2;
}
-
Why is there no additional Noise handshake needed?
This specification (browser-to-browser) requires A and B to exchange their SDP offer and answer over an authenticated channel. Offer and answer contain the TLS certificate fingerprint. The browser validates the TLS certificate fingerprint through the DTLS handshake during the WebRTC connection establishment.
In contrast, the browser-to-server specification allows exchange of the server's multiaddr, containing the server's TLS certificate fingerprint, over unauthenticated channels. In other words, the browser-to-server specification does not consider the TLS certificate fingerprint in the server's multiaddr to be trusted.
-
Why use a custom signaling protocol? Why not use DCUtR?
DCUtR offers time synchronization through a two-step protocol (first
Connect
, thenSync
). This is not needed for WebRTC.DCUtR does not provide a mechanism to trickle local address candidates to the remote as they are discovered. Trickling candidates just-in-time allows for faster WebRTC connection establishment.
-
Why does A and not B initiate the signaling protocol?
In DCUtR B (inbound side of the relayed connection) initiates the DCUtR protocol by opening the DCUtR protocol stream. The reason is that in case A is publicly reachable, B might be able to use connection reversal to connect to A directly. This reason does not apply to the WebRTC browser-to-browser protocol. Given that A and B at this point already have a relayed connection established, they might as well use it to exchange SDP, instead of using connection reversal and WebRTC browser-to-server. Thus, for the WebRTC browser-to-browser protocol, A initiates the signaling protocol by opening the signaling protocol stream.