Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple devices to support Video Conferencing use cases #317

Open
guidou opened this issue Jan 30, 2024 · 20 comments
Open

Support multiple devices to support Video Conferencing use cases #317

guidou opened this issue Jan 30, 2024 · 20 comments

Comments

@guidou
Copy link

guidou commented Jan 30, 2024

There is no consensus yet on the WebRTC WG that support for the VC use cases should be solved using Media Session, but since there is a proposal being discussed, it should support the case when a system has multiple cameras or microphones.
Otherwise the API would be reduced in these systems to notifications about "one of the media devices changed state", which is not useful to a VC application.
Systems with more than one camera or more than one microphone are common and not a special/corner case.

@youennf
Copy link
Contributor

youennf commented Jan 31, 2024

notifications about "one of the media devices changed state",

More likely that "all media devices of a given type change state".

Systems with more than one camera or more than one microphone are common and not a special/corner case.

Most VC solutions I know use only one camera and one microphone.
UA/OS UI is also often based on single camera/microphone usage (Pip window) or mute all cameras/microphones (Safari UI). I am therefore not sure whether this is a feature that is needed on day 1.

This is a reasonable ask though, one way to handle it:

partial dictionary MediaSessionActionDetails {
    sequence<DOMString> deviceIds = []; // empty deviceIds mean all devices are in scope.
}
dictionary CaptureActiveOptions {
    sequence<DOMString> deviceIds = []; // empty deviceIds mean all devices are in scope.
}
partial interface MediaSession {
    Promise<undefined> setMicrophoneActive(boolean active, CaptureActiveOptions options = { });
}

@eladalon1983
Copy link
Member

Video conferencing solutions may allow users to switch between multiple cameras or microphones. It's quite common for users to have more than one. Examining my own setup, which I believe is quite common:

  • Two cameras - one USB camera on top of the external monitor, one camera built into the laptop (whose lid is open).
  • Three(!) microphones - one in the USB camera, one in the laptop, and one in my wireless headphones.

The cameras have different fields of view, so it'd be quite reasonable to instruct the UA to disable one camera and keep the other one active. Given that USB cameras can be flaky and video-conferencing applications would often jump to the next available camera, it'd be somewhat reasonable to expect the UA to offer me the ability to mute just one of them.

@youennf
Copy link
Contributor

youennf commented Jan 31, 2024

somewhat reasonable to expect the UA to offer me the ability to mute just one of them.

I can see that kind of UI happening in the future. The question is more whether this is already a thing.

@eladalon1983
Copy link
Member

somewhat reasonable to expect the UA to offer me the ability to mute just one of them.

I can see that kind of UI happening in the future. The question is more whether this is already a thing.

To the best of my knowledge, it is not currently a thing.
But I don't think that's relevant. It's a very realistic possibility that could materialize in the very near future.

@guidou
Copy link
Author

guidou commented Jan 31, 2024

somewhat reasonable to expect the UA to offer me the ability to mute just one of them.

I can see that kind of UI happening in the future. The question is more whether this is already a thing.

Several VC products I've used allow you to select which microphone and camera you want to use and allow you to change it in the middle of a call. It is obviously necessary for these applications to have the ability know the mute state of all the devices. I can't see how that can be achieved with an API that only reports that a mute state changed without also indicating what device that is and what the state is.

@youennf
Copy link
Contributor

youennf commented Jan 31, 2024

without also indicating what device that is and what the state is.

The assumption is all device of the given type are in scope.
As of the state, this was discussed in #307, I filed #318 to keep track specifically of this.

@jan-ivar
Copy link
Member

I don't think multiple devices are relevant to MediaSession. togglemicrophone, togglecamera and hangup are popular buttons video conferencing websites use to communicate to users (akin to play/pause etc. for media). That's why they're here.

This seems sufficient for the common video conferencing case, just like tracking playback of a single artist at the time is.

@youennf
Copy link
Contributor

youennf commented Feb 13, 2024

It was discussed at today's media wg meeting, here are some thoughts:

  • muting/unmuting all live tracks of a kind is sufficient in the short term.
  • Actual work can happen when there is a use case for it (like per device UI).
  • Current API is extensible enough to easily support this use case in the future.

Marking issue as enhancement.

@eladalon1983
Copy link
Member

eladalon1983 commented May 24, 2024

Without this enhancement (per-device controls), I don't think MediaSession is a decent contender in solving the double-mute problem.

To repeat the user cases I have explained in narrower forums:

  1. Applications need to be able to shows users a panel of all available devices and their state, and allow users to unmute a specific device if they so wish.
  2. When the application wants to setActive(mic1, true), the user agent might then interface with an uplink entity - the OS - and osSpecificSetMicActive(mic1). This may throw up an OS-level prompt. We shouldn't force the UA to iterate through all available mics and induce multiple prompts. If the app only wants to unmute one device, the UA should convey that to the OS, and the user should experience minimum friction.

I am not opposed to having a '*' variant for setMicrophoneActive(device, state) in addition to the per-device version. We could also consider specifying that the UA may throw an error if they don't support the set of devices request by the user, if this helps Safari/Mozilla implement only the UX they want.

@jan-ivar
Copy link
Member

What OS prompts for class-level access to all microphones and then fine-grained prompts on OS-unmute of each?

Applications need to be able to shows users a panel of all available devices and their state,

What state?

We shouldn't force the UA to iterate through all available mics and induce multiple prompts.

I agree this spec shouldn't force UAs to do that. Being involved with the prose, our intent was that to implement the update capture state algorithm, a UA needs only reverse any policy of pausing all input sources of that kind in response to UI. I've filed #332 to clarify. Thanks for your help in spotting this, and sorry for any confusion this may have caused!

@youennf
Copy link
Contributor

youennf commented May 26, 2024

2. We shouldn't force the UA to iterate through all available mics and induce multiple prompts.

I agree. To be clear though, in the usual case, there might be multiple available mics but only one of them is actually in use. In that case, there will be at most one OS prompt when unmuting.

The case you are talking about would be:

  • a web application captures at least two microphones or two cameras at the same time
  • a web application wants to show UI with independent mute/unmute toggle for these two microphones

This is a valid use case but I am not aware of a web site doing this kind of UI today. Ditto for native VC apps.
Similarly, native UX to mute/unmute individual microphones is not a thing yet, as we discussed in this thread a while ago.

2. setActive(mic1, true)

It is really the reverse setActive(true, mic1) given setActive already takes a boolean.
It seems easier to add options as a second optional parameter.

@eladalon1983
Copy link
Member

eladalon1983 commented May 27, 2024

In light of more information gained out-of-band, I don't think this particular issue is a blocker for us anymore. I'm continuing the discussion under the assumption it is indeed an enhancement.

What OS prompts for class-level access to all microphones and then fine-grained prompts on OS-unmute of each?

An OS can offer users individualized mute-controls per peripheral. Imagine that the user manually mutes mic1 and mic3, but not mic2. Now the user interacts with a Web app in a way that indicates an intention to unmute mic1. With the current set of Web APIs, the Web app cannot indicate this to the UA, who would then indicate this to the OS. Rather, the use of setMicrophoneActive(true) (mis)informs the UA that the Web app wants to unmute all microphones, at which point the UA is forced to ask the OS to unmute all microphones. (Which might even throw up multiple prompts; depends on the OS.)

To be clear though, in the usual case, there might be multiple available mics but only one of them is actually in use. In that case, there will be at most one OS prompt when unmuting.

If we don't let the Web app tell the UA that it only wants to unmute a single mic, then how will the UA know? Will it employ a heuristic? Such heuristics can fail if the app then changes to the other mic a second later, and now the user has to deliver yet another gesture and interact with yet another prompt.

@youennf
Copy link
Contributor

youennf commented May 27, 2024

If we don't let the Web app tell the UA that it only wants to unmute a single mic, then how will the UA know?

A VC app is usually capturing with a single device.
When the app is switching from mc1 to mic2, my understanding is that it stops all tracks related to mic1 and it grabs a track from mic2.

When the web app wants to unmute capture, UA knows that the web application is only capturing with mic2. It will only seek to unmute mic2. Web app might later on seek to use mic1 via getUserMedia, which might trigger another prompt.

Such heuristics can fail if the app then changes to the other mic a second later, and now the user has to deliver yet another gesture and interact with yet another prompt.

I don't understand how providing setActive with more information helps that use case.
If setActive is called with mic1, and later on, getUserMedia is called with mic2, both may trigger prompts.

@eladalon1983
Copy link
Member

eladalon1983 commented May 27, 2024

A VC app is usually capturing with a single device.
When the app is switching from mc1 to mic2, my understanding is that it stops all tracks related to mic1 and it grabs a track from mic2.

You are proposing a heuristic - the UA will interpret setMicrophoneActive() as only applying for those microphones that the Web app is currently actively accessing. Heuristics are suboptimal - they can't be guaranteed to always work correctly, they are bad for compatibility, and they make maintenance and extension gradually more difficult over time.

Specifically, let's example the possibility that a Web app cycles through [mic1, mic2, mic3] and finds them all to be muted. The app then calls setMicrophoneActive() while holding none of these. What should the UA do? Unmute all? Unmute the last one held? Should this behave differently on Safari and on Chrome? Is it acceptable if it does?

@youennf
Copy link
Contributor

youennf commented May 27, 2024

a Web app cycles through [mic1, mic2, mic3] and finds them all to be muted.

To do so, the web app needs to call getUserMedia on all these mics.
It is an interesting point which device the UA should choose if some mics are muted and others not. And if the device is required by constraints, what the UA should do if UA cannot unmute the device.

I think we should dive into these discussions in WebRTC WG first.

The app then calls setMicrophoneActive() while holding none of these.

The spec ins not precise here but the spirit of the spec I think is that UA will unmute no device, since there is no capture track.

@jan-ivar
Copy link
Member

See my answer to #332 (comment) for my take on this.

@eladalon1983
Copy link
Member

The spec ins not precise here but the spirit of the spec I think is

If we clarify the spec, we will avoid future compatibility issues.

that UA will unmute no device, since there is no capture track.

Putting myself in Web dev's shoes, I believe I would be very surprised that a call to setMicrophoneActive() is rendered no-op just because I happen to be holding 0/3 microphones, but that same call affects 3/3 microphone if I happen to be holding exactly 1/3 of them.

@youennf
Copy link
Contributor

youennf commented Jun 4, 2024

I would be very surprised that a call to setMicrophoneActive() is rendered no-op just because I happen to be holding 0/3 microphones, but that same call affects 3/3 microphone if I happen to be holding exactly 1/3 of them.

There is probably a misunderstanding somewhere but I cannot pinpoint it yet.

The spirit is that setMicrophoneActive() will only change the microphones that the page is using.
In your example, if page is using 0 out of 3, then it will be a no-op
If page is using 1 out of 3, only the one being used will be impacted (muted or unmuted).

@eladalon1983
Copy link
Member

The spirit is that setMicrophoneActive() will only change the microphones that the page is using.

Does this mean that if different embedded frames call the method, the results would be different? Does Safari UX support that?

Assume top-level, embedded1 and embedded2 each hold a different microphone.
Do we get a different result depending which document calls the method?
What if embedded3 calls it?

@youennf
Copy link
Contributor

youennf commented Jun 4, 2024

These are all good questions and we should probably address them in the spec.

Media session spec could stay at context level (leaving the possibility for the UA to widen the scope) or could go to page level.
Even with a single device used by several iframes, the question is interesting, mediacapture-main does not help as the scope of a source/device is not really clear.

Safari UI is merging all contexts in a single indicator/UI toggle. If page has both muted and unmuted tracks, Safari would show the live icon. A reasonable approach is for Safari to stick to page scope level.

In case of embedded3, it seems reasonable to be a no-op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants