D++ (DPP)
C++ Discord API Bot Library
Voice Model

High Level Summary

Discord's audio system consists of several layers and inter-related systems as shown in the flow chart below.

At the top level, connecting to a voice server requires the library to request the details of a voice server from Discord via the websocket for the shard where the server is located. Performing this request will make Discord reply with a websocket URI and an ephemeral token (not the bot token) which are used to establish an initial connection to this secondary websocket. Every connection to a voice channel creates a separate secondary websocket.

Once connected to this websocket, the library negotiates which protocols it supports and what encryption schemes to use. If you enabled DAVE (Discord's end-to-end encryption scheme) this is negotiated first. An MLS (message layer security) group is joined or created. If you did not enable DAVE, this step is bypassed.

The secondary websocket then gives the library a shared encryption secret and the hostname of an RTP server, which is used to encrypt RTP packets using libssl. This is stored for later.

The next step is to send an initial packet to the RTP server so that the library can detect the public IP where the bot is running. Once the RTP server replies, the bot may tell the websocket what encryption protocols it is going to use to encrypt the RTP packet contents (leaving the RTP header somwhat intact).

The library is now in an initialised state and will accept method calls for send_audio_raw() and send_audio_opus(). If you send raw audio, it will first be encoded as OPUS using libopus, and potentially repacketized to fit into UDP packets, with larger streams being split into multiple smaller packets that are scheduled to be sent in the future.

If at this point DAVE is enabled, the contents of the OPUS encoded audio are encrypted using the AES 128 bit AEAD cipher and using the bot's MLS ratchet, derived during the MLS negotiation which was carried out earlier. All valid participants in the voice channel may use their private key, and the public key derived from their ratchets, to decrypt the OPUS audio.

Regardless of if DAVE is enabled or not, the OPUS stream (encrypted by DAVE, or "plaintext") is placed into an RTP packet, and then encrypted using the shared secret given by the websocket, known only to you and Discord, using the xchacha20 poly1305 cipher.

The completed packet, potentially with two separate layers of encryption (one with a key only you and Discord know, and one with a key only you and participants in the voice chat know!), plus opus encoded audio is sent on its way via UDP to the RTP server, where Discord promptly distribute it to all participants in the chat.

After reading all this, go get a coffee or something, you deserve it! ☕

Flow Diagram

D++ Library version 10.0.32D++ Library version 10.0.31D++ Library version 10.0.30D++ Library version 10.0.29D++ Library version 10.0.28D++ Library version 10.0.27D++ Library version 10.0.26D++ Library version 10.0.25D++ Library version 10.0.24D++ Library version 10.0.23D++ Library version 10.0.22D++ Library version 10.0.21D++ Library version 10.0.20D++ Library version 10.0.19D++ Library version 10.0.18D++ Library version 10.0.17D++ Library version 10.0.16D++ Library version 10.0.15D++ Library version 10.0.14D++ Library version 10.0.13D++ Library version 10.0.12D++ Library version 10.0.11D++ Library version 10.0.10D++ Library version 10.0.9D++ Library version 10.0.8D++ Library version 10.0.7D++ Library version 10.0.6D++ Library version 10.0.5D++ Library version 10.0.4D++ Library version 10.0.3D++ Library version 10.0.2D++ Library version 10.0.1D++ Library version 10.0.0D++ Library version 9.0.19D++ Library version 9.0.18D++ Library version 9.0.17D++ Library version 9.0.16D++ Library version 9.0.15D++ Library version 9.0.14D++ Library version 9.0.13D++ Library version 9.0.12D++ Library version 9.0.11D++ Library version 9.0.10D++ Library version 9.0.9D++ Library version 9.0.8D++ Library version 9.0.7D++ Library version 9.0.6D++ Library version 9.0.5D++ Library version 9.0.4D++ Library version 9.0.3D++ Library version 9.0.2D++ Library version 9.0.1D++ Library version 9.0.0D++ Library version 1.0.2D++ Library version 1.0.1D++ Library version 1.0.0