Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3291: Muting in VoIP calls #3291

Merged
merged 22 commits into from
Jul 16, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions proposals/3291-muting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# MSC3291: Muting in VoIP calls

During VoIP calls, it is common for a user to mute their microphone/camera.
Ideally, the other side should be able to see that the opponent's camera is
muted, so that it could reflect this in the UI (e.g. show the user's avatar
instead of their camera feed). We would also want the changes in the mutes state
to be quick.

Using pure WebRTC there are two ways to do muting and both have their issues:

+ Disabling the corresponding track
+ Setting the corresponding track as `recvonly`/`inactive`

The Alternatives section describes the issues with using these alone.

## Proposal

This MSC proposes extending the `sdp_stream_metadata` object (see
[MSC3077](https://github.com/matrix-org/matrix-doc/pull/3077)) to allow
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
indicating the mute state to the other side using the following fields:

+ `audio_muted` - a boolean indicating the current audio mute state
+ `video_muted` - a boolean indicating the current video mute state

This MSC also adds a new call event `m.call.sdp_stream_metadata_changed`, which
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
has the common VoIP fields as specified in
[MSC2746](https://github.com/matrix-org/matrix-doc/pull/2746) (`version`,
`call_id`, `party_id`) and a `sdp_stream_metadata` object which is the same
thing as `sdp_stream_metadata` in `m.call.negotiate`, `m.call.invite` and
`m.call.answer`. The client sends this event when the `sdp_stream_metadata` has
changed but no negotiation is required (e.g. the user mutes their
camera/microphone).

All tracks should be assumed unmuted unless specified otherwise.

Clients are recommended to not mute the audio of WebRTC tracks locally when a
incoming stream has the `audio_muted` field set to `true`. This is because when the
other user unmutes themselves, there may be a slight delay between their client
sending audio and the `m.call.sdp_stream_metadata_changed` event arriving. If
`enabled` is set to `false`, then any audio sent in between those two events
will not be heard. The other user will still stop transmitting audio once they
mute on their side, so no audio is sent without the user's knowledge.

The same suggestion does not apply to `video_muted` - there clients _should_
mute video locally, so that the receiving side doesn't see black video.

### Example

```JSON
{
"type": "m.call.sdp_stream_metadata_changed",
"room_id": "!roomId",
"content": {
"version": "1",
"call_id": "1414213562373095",
"party_id": "1732050807568877",
"sdp_stream_metadata": {
"2311546231": {
"purpose": "m.usermedia",
"audio_muted:": true,
"video_muted": true
}
}
}
}
```

This event indicates that both audio and video are muted. It is suggested the
video track of stream `2311546231` should be hidden in the UI (probably replaced
by an avatar). It also suggests the UI should show an indication that the audio
track is muted but the client should not mute the audio on the receiving side.
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved

## Potential issues

When the user mutes their camera, some browsers may keep sending meaningless data
which will waste bandwidth.

## Alternatives

### Only disabling the corresponding track

This is the solution that some clients (e.g. Element Android) use at the moment.
While this is almost instantaneous, it doesn't allow the other side to know the
opponent's mute state. This leads to the opponent showing a black screen for a
muted video track and not doing anything for a muted audio track which is bad
for UX.

### Setting the corresponding track as `recvonly`/`inactive`

While this would be beneficial for low bandwidth connections, it takes time. The
delay might be acceptable for video but isn't for audio (with which you would
assume an instantaneous mute state change). This is also problematic since there
could be a confusion with holding (as defined in
[MSC2746](https://github.com/matrix-org/matrix-doc/pull/2746)).

### Using a separate event for muting

While this might feel clearer initially, it doesn't have much real benefit. The
mute state is in fact a meta information about the stream and using
`sdp_stream_metadata` is also more flexible for cases where the user joins a
call already muted. It is also more flexible in general and would be useful if
we ever decided to do what is described in the next section.

### A combination of disabling tracks, `sdp_stream_metadata` and SDP

An option would be using the current method in combination with setting the
corresponding track as `recvonly`/`inactive`. Along with this clients would need
to set the mute state in `sdp_stream_metadata` to avoid conflicts with holding
(as defined in [MSC2746](https://github.com/matrix-org/matrix-doc/pull/2746)).
While this solution might be the most flexible solution as it would allow
clients to choose between bandwidth and a mute state change delay for each
track, it would be harder to implement and feels generally disjointed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we don't want to go for optimal bandwidth usage? I personally often turn off video to have more bandwidth available for audio when the quality is terrible. Feels unfortunate to exclude this. Also, this doesn't feel much harder to implement.

Copy link
Contributor

@bwindels bwindels Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further, why not remove the track entirely and send a m.call.negotiate event with the metadata update rather than just set the direction of the transceiver? It's kind of annoying that the webcam light doesn't turn off when you mute your video.

Copy link
Contributor

@kevincox kevincox Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the choice of the client. In general not sending the video when video is off makes sense. The extra latency of starting up the stream is usually quite small and the bandwidth savings is significant which can make a huge difference for someone on a connection that struggle to send audio and video but can support audio reliably. Turning off the camera is less often used because actually getting a new stream from the camera can have very high and unpredictable latency depending on the hardware or the OS. Probably a good client will have a variety of heuristics to decide how far to shut down the camera but the key thing from the spec point of view that completely turning off the camera is possible if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I stand against going with both options at the same time - let the client choose. @dbkr, would you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we do this now: matrix-org/matrix-js-sdk#3028

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe Enrico recently discovered with his experiments that browsers (at least some of them) now send nothing when a video track is disabled, so the bandwidth argument is even less relevant. Turning cameras off is orthogonal and can be done whether or not you remove the video track. The negotiate mechanism still exists either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Enrico recently discovered with his experiments that browsers (at least some of them) now send nothing when a video track is disabled, so the bandwidth argument is even less relevant

Is this documented anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the MSC need to be updated given the conclusions drawn here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to assume not and resolve this thread. Please re-open if you believe otherwise.


## Security considerations

None that I can think of.
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved

## Dependencies

+ [MSC3077](https://github.com/matrix-org/matrix-doc/pull/3077)

## Unstable prefix

|Release |Development |
|------------------------------------|---------------------------------------------|
|`m.call.sdp_stream_metadata_changed`|`org.matrix.call.sdp_stream_metadata_changed`|
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
|`sdp_stream_metadata` |`org.matrix.msc3077.sdp_stream_metadata` |

We use an unstable prefix for `sdp_stream_metadata` to match
[MSC3077](https://github.com/matrix-org/matrix-doc/pull/3077).