Peer-to-peer Connection with WebRTC in Rust Using webrtc-rs

This is a guide for using webrtc-rs to create p2p connections that can go through nat in Rust. This should be useful for anyone that wants to create a p2p/distributed program. I assume the reader knows about stun, ice¹, websocket. I’ll brefly explain how WebRTC works. Reader also needs to know pem, der, X.509, and pki in general, for the security side of things.

I’m not an expert on WebRTC, just some dude that needs p2p and spent some time figuring out how to use webrt-rs for it; so if you spot some mistake, please do correct me!

Yuo can refer to this post: NAT traversal: STUN, TURN, ICE, what do they actually do?

Overall structure

There are several parts in the system. First there is a bunch of p2p programs that want to connect to each other, let’s call them peers. Then there needs to be a public-facing server that every peer can connect to. You’ll need to write this server and host it yourself. When peer A and B wants to connect to each other, they both send a request to the public server (let’s call it S); S will relay information between A and B, until A and B successfully establish a udp “connection”². Finally, you need some stun servers and maybe even turn servers. There are plenty of free public stun servers (Google, Cloudflare, etc hosts a bunch of them). On the other hand, free public turn server is basically unheard of, since they’re so easy to abuse.

Most of the time people use udp for nat traversal, it’s rare to see tcp connections: it’s more difficult to establish through nat, and only used when the firewall blocks udp.

WebRTC

The main purpose of WebRTC is for video conferencing and VoIP on browsers; transfering arbitrary data requires much less hassle. So we really don’t need most of the WebRTC that deals with video codec and media channels. On top of that, WebRTC isn’t really a single protocol, but rather a bunch of revived protocols plus a spec defining how to use these protocols together. The Rust crate, webrtc-rs, implements each underlining protocol (sctp, dtls, stun, ice, ...) in separate crates, plus a webrtc glue layer. So it’s possible to only use the underlining crates and ignore the WebRTC layer altogether.

Technically, WebRTC already has what we want—data channel. It’s convenient if you’re using WebRTC in browsers with the Javascript api. But for us, it’s simpler to use the underlying protocol directly instead of going through WebRTC; it gives us more control over the process too.

The stack of WebRTC looks roughly like this:

Protocol	Description
WebRTC	Application
sctp	Congestion and flow control
dtls	Security
ice	nat traversal
udp	Transport

Technically, dtls (think of tls for udp) should run on top of sctp (think of tcp Pro Max), right? But WebRTC uses them the other way around. Probably because sctp provides a much nicer abstraction than dtls? Anyway, the designers explained it in detail here: rfc 8831.

Now, here’s how WebRTC establish a connection between two peers A and B:

A creates a local sdp³ (called offer), send it to B through a third-party channel⁴.
B receives A’s sdp, sets it as the remote sdp, and sends B’s local sdp (called answer) to A. Meanwhile, B starts gathering ice candidate according to the information in A’s sdp.
A receives B’s sdp, sets it to its remote sdp, and start gathering ice candidates according to B’s sdp.
While A and B gather ice candidates, they’ll send the candidates they gathered to each other through the signaling server, and try to establish a (udp) connection.⁵
Once the connection is established, A and B setup a dtls connection over it, then a sctp connection over the dtls connection.

sdp (Session Description Protocol) is basically a text packet with a bunch of metadata used for establishing the connection, including media codec, ice information, fingerprints, etc. See SDP Anatomy for more. There’s no need to know the details, because we’re going to use our own kind of sdp.

WebRTC doesn’t specify this third-party, it can be copy-pasting in Message app between two users, email, pidegon, whatever. A common setup is to use a public “signaling server”, that’s our server S.

This is called “trickle ice”. The alternative is to first gather all the candidates, then try to establish ice connection. Trickle ice is much faster and is pretty much the standard practice now.

Authentication

For authentication, A and B each generates a self-signed key, and hash it to get a fingerprint, then put the fingerprint in their sdp. Then they do the ice exchange, and gets each other’s fingerprint from the sdp. When setting up the dtls connection, they accpet any key that the other end provides. But after the handshake completes, they verify that the other end’s key matches the fingerprint

The implication here is that A and B must trust the signaling server to deliver their sdp securely.

The format of the fingerprint is specified in rfc 8122 section 5: “A certificate fingerprint is a secure one-way hash of the Distinguished Encoding Rules (der) form of the certificate.”

Technically many hash functions can be used, but webrtc-rs only supports sha-256; maybe all the browsers and libraries decide to only use sha-256?

For reference, here is how does webrtc-rs hash and validate the fingerprint: validate_fingerprint.

Here’s an example sdp that contains two fingerprints. (a means attribute. See rfc 8866.) The fingerprint is produced by first hasing the der, then print out each byte in hex, and join them together with colon.

m=image 54111 TCP/TLS t38
c=IN IP4 192.0.2.2
a=setup:passive
a=connection:new
a=fingerprint:SHA-256 \
12:DF:3E:5D:49:6B:19:E5:7C:AB:4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF: \
3E:5D:49:6B:19:E5:7C:AB:4A:AD
a=fingerprint:SHA-1 \
4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB

Code

Knowing how WebRTC works is one thing, knowing how to conjure the right module and function in the library is another thing. It doesn’t help that webrtc-rs is relatively thin on documentation. So this section contains code snippets taken directly from working code, plus reference to my program and webrtc-rs.

Signaling server

This is not a part of WebRTC, but for completeness I’ll brefly explain how I wrote my signaling server. There are many articles online about signaling servers too.

For my signaling server, I used websocket since it allows the client to receive streams from the server, plus it provides a nice text/binary message abstraction, making it nicer to use than tcp. I used tokio-tungstenite for websocket.

When a client (say, A) wants to accept connections from other peers, it sends a Bind message to the signaling server S, along with an id. Then, another client (say, B) can send a Connect message to S that asks to connect to A by its id. B’s Connect message would contain its sdp. S relays B’s Connect message to A, then A sends its sdp via a Connect message to B (relayed by S). Then A and B would start sending each other ice candidates via Candidate message through S. Finally, A and B establish e2e connection and don’t need S anymore.

My signaling code is in /src/signaling.rs and /src/signaling.

Cargo.toml

Here’s the relevant crates I used and their version:

sha2 = "0.10.8"
pem = "3.0.4"
# Make sure the version of webrtc-util matches the one that's used by
# webrtc-ice, webrtc-sctp, and webrtc-dtls.
webrtc-ice = "0.10.0"
webrtc-util = "0.8.0"
webrtc-sctp = "0.9.0"
webrtc-dtls = "0.8.0"
# Used by webrtc.
bytes = "1.4.0"
# This is the version used by webrtc-dtls.
rcgen = { version = "0.11.1", features = ["pem", "x509-parser"]}
# This is the version used by webrtc-dtls.
rustls = "0.21.10"

ICE

webrtc_ice documentation.

Suppose we have two peer A and B; A wants to accept connection from B. Then A is the server in this situation, and B is the client. In the same time, both A and B are clients of the signaling server S. To avoid confusion, let’s call A the p2p server, B the p2p client, and call A & B the signaling client.

To start establishing an ice connection, we need to create an agent (source):

use std::sync::Arc;
use webrtc_ice::agent::agent_config::AgentConfig;
use webrtc_ice::agent::Agent;
use webrtc_ice::network_type::NetworkType;
use webrtc_ice::udp_network::{EphemeralUDP, UDPNetwork};
use webrtc_ice::url::Url;

let mut config = AgentConfig::default();
// "Controlling" should be true for the initiator (p2p client), false
// for the acceptor (p2p server).
config.is_controlling = false;
config.network_types = vec![NetworkType::Udp4];
config.udp_network = UDPNetwork::Ephemeral(EphemeralUDP::default());
// A list of public STUN servers.
config.urls = vec![
    Url::parse_url("stun:stun1.l.google.com:19302").unwrap(),
    Url::parse_url("stun:stun2.l.google.com:19302").unwrap(),
    Url::parse_url("stun:stun3.l.google.com:19302").unwrap(),
    Url::parse_url("stun:stun4.l.google.com:19302").unwrap(),
    Url::parse_url("stun:stun.nextcloud.com:443").unwrap(),
    Url::parse_url("stun:stun.relay.metered.ca:80").unwrap(),
];
let agent = Arc::new(Agent::new(config));

If we were to use WebRTC’s glue layer, we would create a sdp and set two ice attributes in it: ufrag and pwd. But since we aren’t using WebRTC’s glue layer, we just need to get ufrag and pwd from the ice agent, serialize it, and send it through the signaling server. This will be our version of the sdp.

Our “sdp-at-home” also needs to include the fingerprint. Technically this fingerprint can be in any format you wish, but I decided to just follow WebRTC’s spec—hash the der version of the public key. Here’s my hash function (source):

use sha2::{Digest, Sha256};
/// Hash the binary DER file and return the hash in fingerprint
/// format: each byte in uppercase hex, separated by colons.
pub fn hash_der(der: &[u8]) -> String {
  let hash = Sha256::digest(der);
  // Separate each byte with colon like webrtc does.
  let bytes: Vec<String> = hash.iter().map(|x| format!("{x:02x}")).collect();
  bytes.join(":").to_uppercase()
}

let fingerprint = hash_der(key_der);
let (ufrag, pwd) = agent.get_local_user_credentials().await;
// Then serialize them and send them over the signaling server.

My hash function is mostly the same as the hash function in webrtc-rs.

Now assume both A and B have their own local sdp (ufrag, pwd, and fingerprint), and received each other’s sdp. The next step is to exchange ice candidates.

To send out candidates, we register a callback function on agent, like so (source):

agent.on_candidate(Box::new(move |candidate| {
    if let Some(candidate) = candidate {
        let candidate = candidate.marshal();

        tokio::spawn(async move {
            // Send out candidate through the signaling server.
        });
    }
    Box::pin(async {})
}))

// And start gathering candidates, once the agent got a candidate,
// it’ll invoke the on_candidate callback and our code will send
// it out.
agent.gather_candidates()?;

On the other side, we want to receive ice candidates from the signaling server and feed them into agent (source):

while let Some(candidate) = (receive candidate from signaling server) {
    let candidate = unmarshal_candidate(&candidate)?;
    let candidate: Arc<dyn Candidate + Send + Sync> = Arc::new(candidate);
    agent.add_remote_candidate(&candidate)?;
}

While gathering and exchanging candidate run in the background, we block on agent.accept() (source) or agent.dial() (source) to get our connection:

// For p2p server A:
let ice_conn = agent.accept(cancel_rx, ufrag, pwd);
// For p2p client B:
let ice_conn = agent.dial(cancel_rx, ufrag, pwd);

DTLS

webrtc_dtls documentation.

Now we need to setup a dtls connection from the ice connection, and verify the fingerprint.
To create a dtls connection, we need to pass it the key we used to generate the fingerprint earlier. Suppose variable key_der: u8[] contains the key in der format, we create the certificate that webrtc_dtls accepts (source):

let dtls_cert = webrtc_dtls::crypto::Certificate {
    certificate: vec![rustls::Certificate(key_der)],
    private_key: webrtc_dtls::crypto::CryptoPrivateKey::from_key_pair(
        &rcgen::KeyPair::from_der(key_der).unwrap(),
    )
        .unwrap(),
};

Then create the dtls connection. For p2p server, do this (source):

let config = webrtc_dtls::config::Config {
    certificates: vec![dtls_cert],
    client_auth: webrtc_dtls::config::ClientAuthType::RequireAnyClientCert,
    // We accept any certificate, and then verifies the provided
    // certificate with the cert we got from signaling server.
    insecure_skip_verify: true,
    ..Default::default()
};

// Pass false for p2p server.
let dtls_conn = DTLSConn::new(ice_conn, config, false, None).await?;

For p2p client, do this (source):

let config = webrtc_dtls::config::Config {
    certificates: vec![dtls_cert],
    // We accept any certificate, and then verifies the provided
    // certificate with the cert we got from signaling server.
    insecure_skip_verify: true,
    ..Default::default()
};

// Pass true for p2p client.
let dtls_conn = DTLSConn::new(ice_conn, config, true, None).await?;

Next, on both p2p server and p2p client, verify the peer certificate of the dtls connection matches the fingerprint we received from the signaling server (we got it along with ufrag and pwd in the sdp) (source).

let certs = dtls_conn.connection_state().await.peer_certificates;
if certs.is_empty() {
    // Throw error.
}
// hash_der is shown in the previous section.
let peer_cert_hash_from_dtls = hash_der(&certs[0]);;
if peer_cert_hash_from_dtls != cert_hash_from_signaling_server {
    // Throw error.
}

SCTP

webrtc_sctp documentation.

We’re getting there! The final step is to setup sctp connection (source):

use webrtc_sctp::association;

let assoc_config = association::Config {
    net_conn: dtls_conn,
    name: "whatever".to_string(),
};

// For p2p server:
let assoc_server = association::Association::server(assoc_config).await?;
sctp_connection.accept_stream().await

// For p2p client:
let assoc_client = association::Association::client(assoc_config).await?;
// The stream identifier can be anything (here I used 1).
sctp_conn.open_stream(1, PayloadProtocolIdentifier::Binary).await?

Conclusion

That’s it! Now we have a binary stream between two peers. While setting everything up, it helps to go one layer at a time, verify it works, and add the next layer. It also helps to first set it up without authentication, then add the key verification step.

Appendix A, a rcgen pitfall

Because I fell into this trap using rcgen and spent two whole nights scratching my head, I want to call it out so readers can avoid it.

Say that you want to generate a certificate and pass it around your program. The intuitive way is to create a rcgen::Certificate, pass it around, and call rcgen::Certificate::serialize_der every time you need a der, right? But actually, every time you call serialize_der, rather than just serializing the certificate, it generates a new certificate. Put it another way, every time you call serialize_der, it returns a different value.

So the correct way to generate a certificate and pass it around is to create a rcgen::Certificate, call serialize_der to get the der, and pass the der around. If you need to use the certificate in another format, just parse the der.

Here’s an GitHub issue discussing it: Issue#62.