Add proposal for mid based signalling #943

dbkr · 2023-03-03T10:56:38Z

No description provided.

SimonBrandner

This really does seem to look great apart of some nits. The only thing I think we should think deeper about is the depth :D

SimonBrandner · 2023-03-03T11:14:18Z

doc/mid-based-signalling.md

+    "m.negotiate": {
+        "version": 1,
+        "call_id": "35657a5b793ce",
+        "conf_id": "bbe53499f82e3",
+        "invitee": "@bob:example.org",
+        "lifetime": 60000,
+        "party_id": "123456",
+        "description": {
+            "type": "offer",
+            "sdp": "[...]",
+        }
+    },


Perhaps this should also be split up into several pieces?

As in split the common call stuff out from the actual negotiation? Interesting idea.

Well m.call.candidates would still include some of this, so it would make sense to have that in a common mixin

doc/mid-based-signalling.md

SimonBrandner · 2023-03-03T11:15:06Z

doc/mid-based-signalling.md

+    "m.tracks.describe": {
+        "1": { // transceiver mid 1
+            "media_uuid": "aaaa-aaaa-aaaaaa-aaaa-aaaa",
+            "media_group_uuid": "1234-1234-123456-1234-1234", // rather than 'track group ID' to match media UUID?


Makes sense this way ✔️

SimonBrandner · 2023-03-03T11:17:06Z

doc/mid-based-signalling.md

+        "version": 1,
+        "call_id": "35657a5b793ce",
+        "conf_id": "bbe53499f82e3",
+        "invitee": "@bob:example.org",


What's the point of this field? Isn't it clear who the recipient is if this is to-device

Yeah, it's really just consistency with 1:1 calls, although if we're essentially deprecating them then maybe we shouldn't worry about it.

SimonBrandner · 2023-03-03T11:20:14Z

doc/mid-based-signalling.md

+XXX: How flat vs deep do we want the structure to be here? I've done it quite deep here,
+organised by user ID / device ID / media group UUID, but they could also just be a flat
+list of tracks. It would be more duplication but maybe less effort to read.


It would also be useful to think about how would we actually implement this and how it fits in with the current implementation

I've expanded on this a bit in 457578a, was this the sort of thing you were thinking about?

SimonBrandner · 2023-03-03T11:23:47Z

doc/mid-based-signalling.md

+This has also been rearrnaged a little to make the media UUIDs the keys and remove the
+unsubscribe section which is unnecessary if we always send the complete set of tracks we
+want to receive (we unsubscribe by just removing the media UUID from the dict).


I am pretty sure we actually wanted to avoid this to keep the messages short. Perhaps we could use null to say we want to unsubscribe. We could use a similar technique for unpublishing

Yes, maybe I am barking up the wrong tree with this and it should just be the deltas each time, especially if we have sequence numbers to ensure we don't lose events.

SimonBrandner · 2023-03-03T11:24:27Z

doc/mid-based-signalling.md

+unsubscribe section which is unnecessary if we always send the complete set of tracks we
+want to receive (we unsubscribe by just removing the media UUID from the dict).
+
+This also now contains a sequence number, so the focus can reply with a an ack:


Do all events now have a sequence number? What does the number mean?

Add feedback to track subscriptions with seqnums so every response can be matched to a request

Yep. I've added a bit more detail in the doc.

doc/mid-based-signalling.md

SimonBrandner · 2023-03-03T11:26:17Z

doc/mid-based-signalling.md

+        "m.call.transferee": false,
+        "m.call.dtmf": false,
+    },
+    "m.tracks.advertise": {


SimonBrandner · 2023-03-04T07:54:45Z

doc/mid-based-signalling.md

+        "aaaa-aaaa-aaaaaa-aaaa-aaaa": {
+            "width": 1024,
+            "height": 576,
+        },
+        "bbbb-bbbb-bbbbbb-bbbb-bbbb": {},


For pull-based media in the future, this is going to need to include the full path (e.g. userId/deviceId/mediaGroupUUID/mediaUUID)

Wait, why would this be different?

So that the focus knows from where to request the track in case it can't read state

Ah right, yes, although in this case the address would also need to include the SFU that the user was publishing to? I wonder if this should look more like:

media: [ { "media_uuid": "bbbb-bbbb-bbbbbb-bbbb-bbbb", "user_id": "@alice:example.org", "device_id": "aaaaaa", "sfu_user_id": "@sfu:example.org", "sfu_device_id": "sfusfusfu", } ]

doc/mid-based-signalling.md

SimonBrandner · 2023-03-06T16:27:25Z

doc/mid-based-signalling.md

@@ -0,0 +1,276 @@
+# MID Based Signalling


This might be nice opportunity to fix conf vs call id somehow

Yes, definitely. afaik its just the MSC that is confused somewhere, but note to check.

doc/mid-based-signalling.md

SimonBrandner · 2023-03-06T16:29:01Z

doc/mid-based-signalling.md

+
+m.call.subscribe
+```
+"m.call.subscribe": {


Suggested change

"m.call.subscribe": {

"m.call.subscribe_to_media": {

Perhaps this for clarity

I made it m.call.subscribe_media as I think it's fine as an abbreviation for consistency.

SimonBrandner · 2023-03-06T16:33:28Z

doc/mid-based-signalling.md

+    "m.call.advertise": {
+        "alice:example.org": { // user ID
+            "88888888": { // device ID
+                "2345-2345-234567-2345-2345": [{ // media group uuid
+                    "media_uuid": "aaaa-aaaa-aaaaaa-aaaa-aaaa":
+                    "purpose": "m.usermedia",
+                    "kind": "video",
+                }, {
+                    "media_uuid": "bbbb-bbbb-bbbbbb-bbbb-bbbb":
+                    "purpose": "m.usermedia",
+                    "kind": "audio",
+                },
+            },
+        }
+    },


From the js-sdk class structure POV, it would make much more sense for this to be structured the same way metadata is atm (user_id and device_id being props of media group uuid) since the individual CallFeeds (media groups) have the userIds and deviceIds as props, so the mapping makes much more sense that way

So basically remove the user ID & device ID levels and move them to attributes of the media group ID? It would be towards the flatter end of possible structures and maybe a nice compromise between depth & duplication. I'm not sure we should necessarily be basing the choices on what the js-sdk does directly, but if the js-sdk has a god reason for doing it that way then that's certainly valid.

Yeah, that's what I mean. I'd say the js-sdk doesn't have a god reason for this since I was the one who wrote and I haven't gone mad enough to consider myself a diety (yet) :D But it does have a good reason - I think it makes sense in terms of media groups being what you display in the UI (but I wrote it so I am biased)

(sorry, I sometimes can't help myself 😅 )

So something like c62296c?

SimonBrandner · 2023-03-06T16:36:07Z

doc/mid-based-signalling.md

+        "m.call.transferee": false,
+        "m.call.dtmf": false,
+    },
+    "m.call.describe": {


The question we should ask ourselves is if we'd ever want to have metadata for media groups too which would then mean it would make sense to have this structured in a way similar to the current metadata structure

Wouldn't we do this in the advertise message though? That's where the bulk of the metadata should be.

Well we might (or not) have some fast-changing metadata on the media group level, so it would be useful for that case

Ah yes, I see. My gut feeling is that the media groups would probably be purely a way to associate two tracks together as a pair (or group) and deliberately not add any further data, but perhaps.

True, that does make sense, but perhaps, not sure

EnricoSchw · 2023-03-06T16:54:56Z

doc/mid-based-signalling.md

+media UUID, perhaps?)
+
+If the focus needs to renegotiate to send the tracks, it does so, describing the media UUIDs it intends to send on the
+transceivers once the negotiation is complete:


In this case, thats the transceiver id's (mid's) from the Focus side. Right?

They should be the same, no? Since the media lines must be the same on each side.

Right, the media lines are bidirectional and correlate to the remote side.

What i mean is, whoever initiates the offer determines what is sent over the mid-line. When Focus only receives tracks via a connection, the Focus will never have a reason for start a media renegotiation, because on their side the media will never change.

In case the Focus start a renegotiation and creates an offer, then the focus sending media as well. In this case the Focus defines, what is sending over the media lines. But this means the client will receive a track from another peer.

If the client not sending media to the Focus the Focus starts with 0 media line. If the client already sending to the Focus the Focus could use as well the 0 media line. But we decide in this case the Focus will use a new Transceiver.

Long story short: The media lines alone define not what will sending over it. I know we want use pre defined Transceiver and reserve the 0-media lines and 1-media lines for Client sending medias. However, I have still my doubts that this is a good idea.

I'm not sure I understand. From my PoV, the three different scenarios work like this:

If Alice (the client) isn't sending any media to the focus, the focus starts sending media to her and starts a renegotiation to add a transceiver, which becomes mid 0. It includes an m.call.describe_media saying that the media on mid 0 is Bob's media.

If Alice is already sending media, the focus renegotiates but decides to add a new transceiver. This becomes mid 2, so its m.call.describe_media informs the client that mid 2 contains Bob's media. There is no entry for mid 0 or 1 because the focus isn't sending anything on those transceivers, it's only receiving.

If Alice is already sending media and the focus decides to re-use one of those transceivers, its m.call.describe_media informs the client that Bob's media will arrive on mid 0. Alice will re-send her own m.call.describe_media in her answer re-iterating that her own media is being sent on mid 0 since she's still sending it in the other direction. The focus's m.call.describe_media describes what's flowing on the transceivers in the focus -> client direction and the client's m.call.describe_media describes what's going client -> focus.

EnricoSchw · 2023-03-06T16:59:34Z

doc/mid-based-signalling.md

+then look very similar to the structure of the `m.call.advertise` event. It could either keep
+a reference to the transceiver it was receiving media on in the structure itself alongside
+the media UUID, or maintain a separate map of media UUID to transceiver / peer connection such
+that the first structure could be marshalled to JSON and sent to clients as-is.


I think the SFU do not need to save and maintain the map of media UUID to transceiver / peer connection. The mid's are only needed to identify the tracks inside the SDP. If the connection is negotiated, they are no longer required.

Rather, the SFU needs to know which tracks belong to which UUID (send and receive).

Well, it need to know what media UUID corresponds to what it's sending in each case: whether it does that by mapping them to mid or track ID or whatever is up to the implementation - is that what you meant?

Yes, exactly, that is what I meant. I only just wanted to pointed out that's MID's quite unstable and only needed to init this process. And we should not based on it in such whatever implementation.

Understood – hopefully c06d3d4 clarifies this a bit.

doc/mid-based-signalling.md

daniel-abramov · 2023-03-07T19:35:17Z

doc/mid-based-signalling.md

+The `select_answer` is also tweaked to be more extensible-event like although is essentially
+the same:


Maybe this is a dumb question, but do we need a select_answer still? It seems like after exchanging the SDP offer and SDP answer, we could establish a connection and start streaming.

I did wonder whether to remove it for group calls, but I think I'm inclined to keep it for consistency. The connection doesn't need to be blocked on the select_answer though, we can still start streaming after just the offer & answer.

doc/mid-based-signalling.md

daniel-abramov · 2023-03-07T19:42:16Z

doc/mid-based-signalling.md

+This also now contains a sequence number. This is a monotonically increasing integer, starting
+at 0 and scoped to the lifetime of the peer connection. The focus will send a reply containing
+this sequence number to acknowledge that it has processed the message. This can be a positive ack:


Nice 🚀

Btw, do we want to use a sequence number or rather some sort of a randomly generated transaction ID?

I know that the advantage of a sequence number is that it still can be used as a transaction ID and we probably need it since To-Device messages are not ordered. The only problem that I can think of though is that what if one party receives a message with a sequence number 10000 after a message with a sequence number 5? What's the behavior? Should we drop the 10000 message or should we buffer it and process it once the previous message arrives (this opens up a possible attack vector of sending 1, 2, 3, 10000000 and causing the receiving client to deplete their own buffer)?

Also, what to do when the SFU gets restarted and does not know the last sent seq? - I suppose this is solved by "scoped to the lifetime of the peer connection", but I wonder if we need to have seq for all events (negotiate events could also fail).

True, seqnums allow for a little more flexibility but we wouldn't need the re-ordering capabilities if we're using a channel that guarantees ordering which the WebRTC DC will give us. A randomly generated ID could avoid clients thinking they have to do reordering and they wouldn't have to worry about masking them monotonically increasing.

Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>

CLAassistant · 2024-09-06T08:43:45Z

All committers have signed the CLA.

Add proposal for mid based signalling

7e63483

github-actions bot deployed to Netlify March 3, 2023 10:58 View deployment

SimonBrandner reviewed Mar 3, 2023

View reviewed changes

m.call.negotiate

be92de3

github-actions bot deployed to Netlify March 3, 2023 13:36 View deployment

Use m.call everywhere, not m.tracks

fff722e

github-actions bot deployed to Netlify March 3, 2023 13:39 View deployment

Spell out seqnums a bit more

84e5bb0

github-actions bot deployed to Netlify March 3, 2023 13:43 View deployment

Add some more detail on how focus/client impl could work

457578a

github-actions bot deployed to Netlify March 3, 2023 16:51 View deployment

SimonBrandner reviewed Mar 4, 2023

View reviewed changes

EnricoSchw mentioned this pull request Mar 6, 2023

Cancelling screen-sharing doesn't propagate to the receiver matrix-org/waterfall#98

Open

EnricoSchw reviewed Mar 6, 2023

View reviewed changes

doc/mid-based-signalling.md Show resolved Hide resolved

EnricoSchw reviewed Mar 6, 2023

View reviewed changes

doc/mid-based-signalling.md Outdated Show resolved Hide resolved

SimonBrandner reviewed Mar 6, 2023

View reviewed changes

EnricoSchw reviewed Mar 6, 2023

View reviewed changes

Typo

89536de

github-actions bot deployed to Netlify March 7, 2023 14:40 View deployment

daniel-abramov reviewed Mar 7, 2023

View reviewed changes

Typo

6a76873

Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>

github-actions bot deployed to Netlify March 8, 2023 11:16 View deployment

dbkr and others added 3 commits March 8, 2023 11:21

Clarify implementation suggestions.

c06d3d4

Typos

69511f4

Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>

Typo

4f33655

Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>

github-actions bot deployed to Netlify March 8, 2023 11:28 View deployment

Suffix event types with _media

daf7405

github-actions bot deployed to Netlify March 8, 2023 11:31 View deployment

Flatten advertise_media structure a bit

c62296c

github-actions bot deployed to Netlify March 8, 2023 12:22 View deployment

		The `select_answer` is also tweaked to be more extensible-event like although is essentially
		the same:

Add proposal for mid based signalling #943

Are you sure you want to change the base?

Add proposal for mid based signalling #943

Conversation

dbkr commented Mar 3, 2023

SimonBrandner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Sep 6, 2024 • edited Loading

CLAassistant commented Sep 6, 2024 •

edited

Loading