Each message sent with the Snapcast binary protocol is split up into two parts:
- A base message that provides general information like time sent/received, type of the message, message size, etc
- A typed message that carries the rest of the information
The protocol is using little endian.
When a client joins a server, the following exchanges happen
- Client opens a TCP socket to the server (default port is 1704)
- Client sends a Hello message
- Server sends a Server Settings message
- Server sends a Stream Tags message
- Server sends a Codec Header message
- Until the server sends this, the client shouldn't play any Wire Chunk messages
- The server will now send Wire Chunk messages, which can be fed to the audio decoder.
- When it comes time for the client to disconnect, the socket can just be closed.
- Client periodically sends a Time message, carrying a sent timestamp
t_client-sent
- Receives a Time response containing the client to server time delta
latency_c2s = t_server-recv - t_client-sent + t_network-latency
and the server sent timestampt_server-sent
- Calculates
latency_s2c = t_client-recv - t_server-sent + t_network_latency
- Calcutates the time diff between server and client as
(latency_c2s - latency_s2c) / 2
, eliminating the network latency (assumed to be symmetric)
- Receives a Time response containing the client to server time delta
Typed Message ID | Name | Notes |
---|---|---|
0 | Base | The beginning of every message containing data about the typed message |
1 | Codec Header | The codec-specific data to put at the start of a stream to allow decoding |
2 | Wire Chunk | A part of an audio stream |
3 | Server Settings | Settings set from the server like volume, latency, etc |
4 | Time | Used for synchronizing time with the server |
5 | Hello | Sent by the client when connecting with the server |
6 | Stream Tags | Metadata about the stream for use by the client |
Field | Type | Description |
---|---|---|
type | uint16 | Should be one of the typed message IDs |
id | uint16 | Used in requests to identify the message (not always used) |
refersTo | uint16 | Used in responses to identify which request message ID this is responding to |
sent.sec | int32 | The second value of the timestamp when this message was sent. Filled in by the sender. |
sent.usec | int32 | The microsecond value of the timestamp when this message was sent. Filled in by the sender. |
received.sec | int32 | The second value of the timestamp when this message was received. Filled in by the receiver. |
received.usec | int32 | The microsecond value of the timestamp when this message was received. Filled in by the receiver. |
size | uint32 | Total number of bytes of the following typed message |
Field | Type | Description |
---|---|---|
codec_size | unint32 | Length of the codec string (not including a null character) |
codec | char[] | String describing the codec (not null terminated) |
size | uint32 | Size of the following payload |
payload | char[] | Buffer of data containing the codec header |
The payload depends on the used codec:
- Flac: the FLAC audio file header, as described here. The decoder must be initialized with this header.
- Ogg: the vorbis stream header, as described here. The decoder must be initialized with this header.
- PCM: a RIFF WAVE header, as described here. PCM is not encoded, but the decoder must know the samplerate, bit depth and number of channels, which is encoded into the header
- Opus: a dummy header is sent, containing a 4 byte ID (0x4F505553, ascii for "OPUS"), 4 byte samplerate, 2 byte bit depth, 2 byte channel count (all little endian)
Field | Type | Description |
---|---|---|
timestamp.sec | int32 | The second value of the timestamp when this part of the stream was recorded |
timestamp.usec | int32 | The microsecond value of the timestamp when this part of the stream was recorded |
size | uint32 | Size of the following payload |
payload | char[] | Buffer of data containing the encoded PCM data (a decodable chunk per message) |
Field | Type | Description |
---|---|---|
size | uint32 | Size of the following JSON string |
payload | char[] | JSON string containing the message (not null terminated) |
Sample JSON payload (whitespace added for readability):
{
"bufferMs": 1000,
"latency": 0,
"muted": false,
"volume": 100
}
volume
can have a value between 0-100 inclusive
Field | Type | Description |
---|---|---|
latency.sec | int32 | The second value of the latency between the server and the client |
latency.usec | int32 | The microsecond value of the latency between the server and the client |
Field | Type | Description |
---|---|---|
size | uint32 | Size of the following JSON string |
payload | char[] | JSON string containing the message (not null terminated) |
Sample JSON payload (whitespace added for readability):
{
"Arch": "x86_64",
"ClientName": "Snapclient",
"HostName": "my_hostname",
"ID": "00:11:22:33:44:55",
"Instance": 1,
"MAC": "00:11:22:33:44:55",
"OS": "Arch Linux",
"SnapStreamProtocolVersion": 2,
"Version": "0.17.1"
}
Field | Type | Description |
---|---|---|
size | uint32 | Size of the following JSON string |
payload | char[] | JSON string containing the message (not null terminated) |
Sample JSON payload (whitespace added for readability):
{
"STREAM": "default"
}
According to the source, these tags can vary based on the stream.