-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change UDP protocol #4284
Comments
Yes, you could. It wouldn't be particularly nice but should certainly be possible. If we repeat that procedure a few times, the respective code might get a bit messy though. Using protobuf for UDP would prevent that.
As explained above, this would not have to be a breaking change (well, it would be breaking for the UDP protocol itself but not for Mumble as a whole). IMO the cost here is "merely" implementation of the new protocol and maintenance of both versions (at least until the old one has practically died out, then we could make a breaking change by dropping it).
That sounds pretty much like DIY protobuf, so I'd instead just go with the official one.
Both
The issue I see with protobuf is that it adds a byte of overhead for each field even if we know it to always be present. That'd be 5 bytes of useless overhead (target, codec, session id, sequence number, payload). message UDPPacket {
bytes positional_data = 1;
bytes opus = 2;
uint64 your_new_thing = 3;
// intentionally put these at later field indices to keep fields with single-byte varint encoding available for frequently used fields
uint64 ping = 50;
// repeated might make more sense for some of these (haven't checked their current encoding)
bytes speex = 51;
bytes celt_alpha = 52;
bytes celt_beta = 53;
} (from memory, didn't try to compile) |
Updated my comment about overhead (I forgot that there are actually quite a few header fields in the packets). |
Thanks for cc-ing me :). I think @Johni0702 had made a good point. I'm kind of suspicious about using protobuf for the audio data. The current design of this UDP protocol has lasted for years. There must be a reason why people use this DIY structure instead of protobuf itself. It is true that using protobuf will grant us more flexibility over the structure of the packet. But at the end of the day, the question is do we really need this flexibility? I think the content of a packet shouldn't change in the middle of a session, so is there a way we can negotiate the structure of the packet first? E.g. a client tell the server it doesn't use the positional audio field, but it will use the other fields. Then the server won't expect positional audio data at that position inside the packet. Another question is, do all audio packets need to contain these metadata? What exactly does the metadata contain? I remember @Krzmbrzl would like to append a field of "voice target". I think that is something that can be cached on the server. Assume the client has told the server the last field is the voice target. In that case, only the first packet needs to contain this field. Packets come after it can just truncate that field and the server immediately knows this field is the same as the last packet with that field. In this way, we can have almost no overhead at all. |
Absolutely, but that'd create a lot of maintenance overhead and would also be very painful for other server implementations to implement. Effectively they'd have to explicitly support each and every packet variation that exists in order to not refuse the connection for some clients.
Well this is indeed something that can change mid-session. It could of course be communicated to the server first, but this also involves overhead in the implementation.
IN my specific application the problem is not to get the voice target to the server (this is already done) but the other way around.
Well the problem is that you can't be sure that the "first" packet is even received by the client (remember that UDP is a lossy protocol).
I guess by using a custom format you save a few bytes of overhead. However you gain a stable protocol for that that has been tested and used for years and that can be very easily adapted by 3rdParty software. Furthermore I think bandwidth is not that much of a problem anymore than it used to be. A few bytes of overhead compared to the few hundred bytes of audio data that is transmitted, seems actually somewhat negligible 🤔 |
I don't really think it is a neglectable overhead... I did a little test just now. Currently, my bot sends music packet with size varies from 200 to 500 bytes, and mumble's voice packet has a fixed length of 160 bytes, while Plumble on Android's packet is 100 bytes. Adding a few more bytes can increase network usage by 1% or more in the worst case. And, the maximum timespan of an opus packet is 60ms. Do we have to repeatedly send the metadata every 60ms? It's about repeating something 30 times per second. For a typical packet of 20ms, that means we repeat it 100 times per second. But I also acknowledge that the missing packet could be a problem. Anyway, I think we need to discuss types of metadata to send with the audio packet first. The rule of thumb would be, we attach as least data as possible to the voice packet. Attaching a protobuf section that is optional (can be truncated) after the current packet is a way out (and merged the positional information into it as well). We can even design another message type to handle these requirements. Edit: And another way is we repeat the metadata section once per 100ms, or even at s larger interval. Anyway, if some metadata is really urgent and needs to be sent as frequently as possible, we can always go back to one per packet. |
Of course
But leaves us with a potential breaking change in the future once we exceed the capabilities of the current package format.
Not really. For messages that do not contain positional data, you'd still have to append 12 bytes (3 floats) to the data before the ProtoBuf section as there is no other way of knowing whether the 12 bytes following the audio data is the ProtoBuf message or whether that should be the positional audio. That's the original problem I was trying to frame in my initial post. Therefore if the usage of ProtoBuf introduces, let's say 5 bytes of overhead, we'd still save 7 bytes compared to the "append it to the current packet format" approach.
Well of course we should be careful with adding stuff to the UDP packets and what is added should always be as little as possible.
Probably not but as I said before: It's the only way to make sure that the data arrives with the audio as you can't rely on single UDP packets to get to their destination |
Having looked into ProtocolBuffers a bit more, I'm also not really convinced that a custom implementation would actually gain a lot in terms of size-savings. The protocol is pretty mature and compresses data into a single byte were possible. It is of course more overhead than not having any flexibility at all and instead using a fixed packet format that the client implicitly has to know. But I don't think that you can have a flexible approach without a little overhead in the message size |
In our last Team-meeting we have discussed this issue and concluded that moving the UDP channel to ProtoBuf messages as well has several advantages while the drawbacks (additional overhead of a few bytes) are comparatively small. Thus I'll close this discussion issue and create a proper issue for it instead. |
Background
Our current UDP protocol is defined here: https://mumble-protocol.readthedocs.io/en/latest/voice_data.html#packet-format
The gist is that each UDP packet contains a header-byte that defines its type and then it contains the actual payload of the message.
Due to the implementation we have only 3bits for the message type (allowing for values 0-7) of which 0-4 are already in use.
The problem arises when one thinks about attempting to extend the protocol without breaking backwards compatibility.
In the particular example I wanted to add 1 byte of meta-information to a voice packet. However due to the way the protocol works, I can't just append it to the payload if the payload doesn't contain positional data (3 float values (= 12 bytes) at the end of the payload). Instead I'd have to add dummy positional data and then append my value.
This introduces needless overhead and furthermore it also doesn't solve the problem of future extensibility.
We could solve this problem by introducing a new
type
but that trick will only work 3 times before we exceed the 3bits limit we have.Suggestion
Thus my suggestion is to make a breaking change to the UDP packet format in such a way that future extensions (should they happen) won't have to break backwards compatibility and won't have to introduce needless overhead.
I think we basically have 2 options for this:
In order to extend the protocol, we'd then simply have to define a new content type. Older clients that don't know that type, will simply ignore it.
I'm not sure how feasible it is to wrap our audio data into a protocol buffer and how much overhead that'd introduce compared to a self-baked version... 🤔
Backwards compatibility
In order to not break older clients, we'd have to implement some logic on murmur's side that'll send old-style UDP packets to clients that don't know the new version yet.
Likewise it has to be able to understand old UDP packets as well.
Goals
With this proposal I'd like to make the UDP packet definition more flexible and thus future-proof. The cost here is of course that we'd have a breaking change now, were we might get away without having one.
What do you think about this?
/cc @davidebeatrici @Kissaki
/cc @TerryGeng (as a 3rdparty software developer)
The text was updated successfully, but these errors were encountered: