Peer-to-peer Connection with WebRTC in Rust Using webrtc-rs
This is a guide for using webrtc-rs to create p2p connections that can go through nat in Rust. This should be useful for anyone that wants to create a p2p/distributed program. I assume the reader knows about stun, ice1, websocket. I’ll brefly explain how WebRTC works. Reader also needs to know pem, der, X.509, and pki in general, for the security side of things.
I’m not an expert on WebRTC, just some dude that needs p2p and spent some time figuring out how to use webrt-rs for it; so if you spot some mistake, please do correct me!
Overall structure
There are several parts in the system. First there is a bunch of p2p programs that want to connect to each other, let’s call them peers. Then there needs to be a public-facing server that every peer can connect to. You’ll need to write this server and host it yourself. When peer A and B wants to connect to each other, they both send a request to the public server (let’s call it S); S will relay information between A and B, until A and B successfully establish a udp “connection”2. Finally, you need some stun servers and maybe even turn servers. There are plenty of free public stun servers (Google, Cloudflare, etc hosts a bunch of them). On the other hand, free public turn server is basically unheard of, since they’re so easy to abuse.
WebRTC
The main purpose of WebRTC is for video conferencing and VoIP on browsers; transfering arbitrary data requires much less hassle. So we really don’t need most of the WebRTC that deals with video codec and media channels. On top of that, WebRTC isn’t really a single protocol, but rather a bunch of revived protocols plus a spec defining how to use these protocols together. The Rust crate, webrtc-rs, implements each underlining protocol (sctp, dtls, stun, ice, ...) in separate crates, plus a webrtc glue layer. So it’s possible to only use the underlining crates and ignore the WebRTC layer altogether.
Technically, WebRTC already has what we want—data channel. It’s convenient if you’re using WebRTC in browsers with the Javascript api. But for us, it’s simpler to use the underlying protocol directly instead of going through WebRTC; it gives us more control over the process too.
The stack of WebRTC looks roughly like this:
Protocol | Description |
---|---|
WebRTC | Application |
sctp | Congestion and flow control |
dtls | Security |
ice | nat traversal |
udp | Transport |
Technically, dtls (think of tls for udp) should run on top of sctp (think of tcp Pro Max), right? But WebRTC uses them the other way around. Probably because sctp provides a much nicer abstraction than dtls? Anyway, the designers explained it in detail here: rfc 8831.
Now, here’s how WebRTC establish a connection between two peers A and B:
- A creates a local sdp3 (called offer), send it to B through a third-party channel4.
- B receives A’s sdp, sets it as the remote sdp, and sends B’s local sdp (called answer) to A. Meanwhile, B starts gathering ice candidate according to the information in A’s sdp.
- A receives B’s sdp, sets it to its remote sdp, and start gathering ice candidates according to B’s sdp.
- While A and B gather ice candidates, they’ll send the candidates they gathered to each other through the signaling server, and try to establish a (udp) connection.5
- Once the connection is established, A and B setup a dtls connection over it, then a sctp connection over the dtls connection.
Authentication
For authentication, A and B each generates a self-signed key, and hash it to get a fingerprint, then put the fingerprint in their sdp. Then they do the ice exchange, and gets each other’s fingerprint from the sdp. When setting up the dtls connection, they accpet any key that the other end provides. But after the handshake completes, they verify that the other end’s key matches the fingerprint
The implication here is that A and B must trust the signaling server to deliver their sdp securely.
The format of the fingerprint is specified in rfc 8122 section 5: “A certificate fingerprint is a secure one-way hash of the Distinguished Encoding Rules (der) form of the certificate.”
Technically many hash functions can be used, but webrtc-rs only supports sha-256; maybe all the browsers and libraries decide to only use sha-256?
For reference, here is how does webrtc-rs hash and validate the
fingerprint:
validate_fingerprint
.
Here’s an example sdp that contains two
fingerprints. (a
means attribute. See
rfc 8866.) The fingerprint is
produced by first hasing the der, then
print out each byte in hex, and join them together with colon.
m=image 54111 TCP/TLS t38 c=IN IP4 192.0.2.2 a=setup:passive a=connection:new a=fingerprint:SHA-256 \ 12:DF:3E:5D:49:6B:19:E5:7C:AB:4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF: \ 3E:5D:49:6B:19:E5:7C:AB:4A:AD a=fingerprint:SHA-1 \ 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
Code
Knowing how WebRTC works is one thing, knowing how to conjure the right module and function in the library is another thing. It doesn’t help that webrtc-rs is relatively thin on documentation. So this section contains code snippets taken directly from working code, plus reference to my program and webrtc-rs.
Signaling server
This is not a part of WebRTC, but for completeness I’ll brefly explain how I wrote my signaling server. There are many articles online about signaling servers too.
For my signaling server, I used websocket since it allows the client to receive streams from the server, plus it provides a nice text/binary message abstraction, making it nicer to use than tcp. I used tokio-tungstenite for websocket.
When a client (say, A) wants to accept connections from other peers,
it sends a Bind
message to the signaling server S, along
with an id. Then, another client (say, B) can send a Connect
message to S that asks to connect to A by its id. B’s
Connect
message would contain its sdp. S relays B’s Connect
message to A,
then A sends its sdp via a
Connect
message to B (relayed by S). Then A and B would
start sending each other ice candidates
via Candidate
message through S. Finally, A and B establish
e2e connection and don’t need S anymore.
My signaling code is in
/src/signaling.rs
and
/src/signaling
.
Cargo.toml
Here’s the relevant crates I used and their version:
sha2 = "0.10.8" pem = "3.0.4" # Make sure the version of webrtc-util matches the one that's used by # webrtc-ice, webrtc-sctp, and webrtc-dtls. webrtc-ice = "0.10.0" webrtc-util = "0.8.0" webrtc-sctp = "0.9.0" webrtc-dtls = "0.8.0" # Used by webrtc. bytes = "1.4.0" # This is the version used by webrtc-dtls. rcgen = { version = "0.11.1", features = ["pem", "x509-parser"]} # This is the version used by webrtc-dtls. rustls = "0.21.10"
ICE
Suppose we have two peer A and B; A wants to accept connection from B. Then A is the server in this situation, and B is the client. In the same time, both A and B are clients of the signaling server S. To avoid confusion, let’s call A the p2p server, B the p2p client, and call A & B the signaling client.
To start establishing an ice connection, we need to create an agent (source):
use std::sync::Arc; use webrtc_ice::agent::agent_config::AgentConfig; use webrtc_ice::agent::Agent; use webrtc_ice::network_type::NetworkType; use webrtc_ice::udp_network::{EphemeralUDP, UDPNetwork}; use webrtc_ice::url::Url; let mut config = AgentConfig::default(); // "Controlling" should be true for the initiator (p2p client), false // for the acceptor (p2p server). config.is_controlling = false; config.network_types = vec![NetworkType::Udp4]; config.udp_network = UDPNetwork::Ephemeral(EphemeralUDP::default()); // A list of public STUN servers. config.urls = vec![ Url::parse_url("stun:stun1.l.google.com:19302").unwrap(), Url::parse_url("stun:stun2.l.google.com:19302").unwrap(), Url::parse_url("stun:stun3.l.google.com:19302").unwrap(), Url::parse_url("stun:stun4.l.google.com:19302").unwrap(), Url::parse_url("stun:stun.nextcloud.com:443").unwrap(), Url::parse_url("stun:stun.relay.metered.ca:80").unwrap(), ]; let agent = Arc::new(Agent::new(config));
If we were to use WebRTC’s glue layer, we would create a sdp and set two ice
attributes in it: ufrag
and pwd
. But since we
aren’t using WebRTC’s glue layer, we just need to get ufrag
and pwd
from the ice agent,
serialize it, and send it through the signaling server. This will be our
version of the sdp.
Our “sdp-at-home” also needs to include the fingerprint. Technically this fingerprint can be in any format you wish, but I decided to just follow WebRTC’s spec—hash the der version of the public key. Here’s my hash function (source):
use sha2::{Digest, Sha256}; /// Hash the binary DER file and return the hash in fingerprint /// format: each byte in uppercase hex, separated by colons. pub fn hash_der(der: &[u8]) -> String { let hash = Sha256::digest(der); // Separate each byte with colon like webrtc does. let bytes: Vec<String> = hash.iter().map(|x| format!("{x:02x}")).collect(); bytes.join(":").to_uppercase() } let fingerprint = hash_der(key_der); let (ufrag, pwd) = agent.get_local_user_credentials().await; // Then serialize them and send them over the signaling server.
My hash function is mostly the same as the hash function in webrtc-rs.
Now assume both A and B have their own local sdp (ufrag
, pwd
, and
fingerprint
), and received each other’s sdp. The next step is to exchange ice candidates.
To send out candidates, we register a callback function on
agent
, like so (source):
agent.on_candidate(Box::new(move |candidate| { if let Some(candidate) = candidate { let candidate = candidate.marshal(); tokio::spawn(async move { // Send out candidate through the signaling server. }); } Box::pin(async {}) })) // And start gathering candidates, once the agent got a candidate, // it’ll invoke the on_candidate callback and our code will send // it out. agent.gather_candidates()?;
On the other side, we want to receive ice candidates from the signaling server and feed them
into agent
(source):
while let Some(candidate) = (receive candidate from signaling server) { let candidate = unmarshal_candidate(&candidate)?; let candidate: Arc<dyn Candidate + Send + Sync> = Arc::new(candidate); agent.add_remote_candidate(&candidate)?; }
While gathering and exchanging candidate run in the background, we
block on agent.accept()
(source)
or agent.dial()
(source)
to get our connection:
// For p2p server A: let ice_conn = agent.accept(cancel_rx, ufrag, pwd); // For p2p client B: let ice_conn = agent.dial(cancel_rx, ufrag, pwd);
DTLS
Now we need to setup a dtls connection
from the ice connection, and verify the
fingerprint.
To create a dtls connection, we need to
pass it the key we used to generate the fingerprint earlier. Suppose
variable key_der: u8[]
contains the key in der format, we create the certificate that webrtc_dtls
accepts (source):
let dtls_cert = webrtc_dtls::crypto::Certificate { certificate: vec![rustls::Certificate(key_der)], private_key: webrtc_dtls::crypto::CryptoPrivateKey::from_key_pair( &rcgen::KeyPair::from_der(key_der).unwrap(), ) .unwrap(), };
Then create the dtls connection. For p2p server, do this (source):
let config = webrtc_dtls::config::Config { certificates: vec![dtls_cert], client_auth: webrtc_dtls::config::ClientAuthType::RequireAnyClientCert, // We accept any certificate, and then verifies the provided // certificate with the cert we got from signaling server. insecure_skip_verify: true, ..Default::default() }; // Pass false for p2p server. let dtls_conn = DTLSConn::new(ice_conn, config, false, None).await?;
For p2p client, do this (source):
let config = webrtc_dtls::config::Config { certificates: vec![dtls_cert], // We accept any certificate, and then verifies the provided // certificate with the cert we got from signaling server. insecure_skip_verify: true, ..Default::default() }; // Pass true for p2p client. let dtls_conn = DTLSConn::new(ice_conn, config, true, None).await?;
Next, on both p2p server and p2p client, verify the peer certificate
of the dtls connection matches the
fingerprint we received from the signaling server (we got it along with
ufrag
and pwd
in the sdp) (source).
let certs = dtls_conn.connection_state().await.peer_certificates; if certs.is_empty() { // Throw error. } // hash_der is shown in the previous section. let peer_cert_hash_from_dtls = hash_der(&certs[0]);; if peer_cert_hash_from_dtls != cert_hash_from_signaling_server { // Throw error. }
SCTP
We’re getting there! The final step is to setup sctp connection (source):
use webrtc_sctp::association; let assoc_config = association::Config { net_conn: dtls_conn, name: "whatever".to_string(), }; // For p2p server: let assoc_server = association::Association::server(assoc_config).await?; sctp_connection.accept_stream().await // For p2p client: let assoc_client = association::Association::client(assoc_config).await?; // The stream identifier can be anything (here I used 1). sctp_conn.open_stream(1, PayloadProtocolIdentifier::Binary).await?
Conclusion
That’s it! Now we have a binary stream between two peers. While setting everything up, it helps to go one layer at a time, verify it works, and add the next layer. It also helps to first set it up without authentication, then add the key verification step.
Appendix A, a rcgen pitfall
Because I fell into this trap using rcgen and spent two whole nights scratching my head, I want to call it out so readers can avoid it.
Say that you want to generate a certificate and pass it around your
program. The intuitive way is to create a
rcgen::Certificate
, pass it around, and call
rcgen::Certificate::serialize_der
every time you need a
der, right? But actually, every time you
call serialize_der
, rather than just serializing the
certificate, it generates a new certificate. Put it another way,
every time you call serialize_der
, it returns a different
value.
So the correct way to generate a certificate and pass it around is to
create a rcgen::Certificate
, call serialize_der
to get the der, and pass the der around. If you need to use the certificate in
another format, just parse the der.
Here’s an GitHub issue discussing it: Issue#62.