-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for content protection #41
Comments
So far I have been assuming that we will be able to associate a MediaKeys (EME) instance to a VideoDecoder (or AudioDecoder), but lacking a protected video output path it's not yet clear how that will work. It may be possible to plumb a protected picture using a canvas imagebitmap context, but I don't know if that is workable on all platforms. It may be necessary to integrate more directly with I don't know of any way to use an app-supplied decoder with MediaKeys. I don't think there is any path for encoding with MediaKeys. |
@sandersdan The use cases I'm aware of typically involve decoding rather than encoding. Encoding would be involved in video upload, but typically content protection isn't introduced there. Decode use cases often involve low-latency streaming (over RTCDataChannel or WebTransport), where WebCodecs would provide a higher performance and potentially more interoperable substitute for "low-latency MSE". The events being streamed could be sporting events, musical or theatrical performances, political gatherings, company meetings, games or AR/VR demonstrations. Some examples are collected here. |
I am also interested in this scenario. Being able to use WebCodecs with DRM (Fairplay, widevine, playready) would be beneficial. We could keep the DRMed elementary stream in the same format to be compatible with CMAF but without the container. |
triage note: marking 'extension', as the anticipated shape (associating MediaKeys) would be done via new members on the config dict, or new methods on the codec interface. As Dan points out, further extensions to canvas are probably also necessary. |
There are at least three key parts to a secure video pipeline:
In the first part you, as the person issuing the protected content, have to trust the client implementation that it won't allow the plaintext elementary streams, or the uncompressed essences out of the system. You can't have an EME pipeline that passes plaintext elementary streams back in a way which can be inspected, modified or diverted. You also cannot have a decode pipeline which allows injection of code, inspection of code or access to the image space by other code. In previous secure implementations I have worked with any transformations of the video were usually done as a separate layer where the underlying software had no actual access to the video frame buffer but it could command the frame buffer to resize, warp, etc. An upper layer image compositor in secure memory space was the final arbiter for the render. Secure key exchange doesn't have to be rocket science or proprietary, the principle is well understood, again it's about protected memory that other (untrusted) components of the system cannot access. It's possible to define secure domains of trust where a crypto pipeline could request secrets and not allow them to pass outside of that execution environment. If there was a browser implementation of secure execution pipelines which was able to make use of native trusted execution, and in which the browser was effectively able to execute signed code (even if not encrypted / scrambled) against secure memory then there would even not necessarily need to be proprietary DRM implementations. Ultimately it's about not allowing image/video data in memory to be inspected or read, even if being modified, that's the key factor. It's been done before in hardware and it could probably be allowed in software when backed up by a trusted system. But if the render pipeline isn't secured somehow then it's basically not going to get traction with those who care about premium content rights. |
Well, that's been shipping for some time already in multiple browser / OS combination. This is about extending this for Web Codecs, which means essentially giving a key to a decoder, and then preventing read backs of Essentially do what browsers do when implementing (e.g.) Widevide Level 1 support, but with a bit more flexibility. |
I agree with Paul. However, I think we could only offer any flexibility with L3 protected content, for L1 the frames never come back from the hardware. For L3, we'd could do something like mark the frames as tainted (VideoFrame would need to grow this) like we do for CORS, so sites can manipulate them via canvas/WebGL, but can't read them back. For L1, probably at most we could allow sites to decode opaque frames and then pass them to a MediaStreamTrackGenerator that goes into a |
One question is how WebCodecs works with SFrame. Is it possible for WebCodecs Decoder to take an SFrame as input directly without exposing the key and cleartext encryptedChunk to Javascript? Similar questions have arisen with WebRTC Encoded Transform. |
My first time really looking at SFrame. At a glance it looks pretty different from EME use cases. As you know, EME inovles license servers, output protection, etc... often removing the UA entirely from roles of decoding and rendering. Whereas IIUC, SFrame is more protecting frames on the wire, but is less concerned with protecting them from javascript. My first thought would be that SFrame encryption/decryption seems like a post-encode/decode step, so it's external post processing outside of WebCodecs. Seem right? Noob question: I see lots of discussion on how to prevent JavaScript access to the SFrame keys. Obviously keys are sensitive, but I don't quite follow why they're concerned w/ restricting JS access to the keys if they're not also restricting JS access to the raw media? |
@chcunningham SFrame is about protecting content from access by untrusted parties. There are use cases where the Javascript may be trusted or untrusted. Where the JS is trusted, the application is allowed to access keys as well as raw media (e.g. VideoFrames) but can encrypt the content to prevent access by an untrusted middlebox (e.g. a conference server provided by a CPaaS service). In this use case, the SFrame would be decrypted to yield an encrypted chunk prior to WebCodecs decode, or the encrypted chunk from the WebCodecs encoder would be encrypted prior to transmission. An example of this use case would be a Javascript application written using a Cloud Communications Platform SDK. If the JS is untrusted, then the web application should not have access to the keys and operations on the cleartext content should be restricted. For example, the application should not be able to record the content or have access to the raw data in order to transform it (e.g. so as to protect against creation of deep fakes). This use case more resembles an EME use case. For example, a sporting event or concert could be streamed in low latency by using WebTransport or RTDataChannel for transport and WebCodecs for decode. Content protection might be desired in this scenario, but without the overhead of containerization. In this use case, the content is being played on a Javascript application or device from another vendor (e.g. a concert offered by a streaming service, played on a device like a Roku, AppleTV, etc.). |
What would motivate folks to use sframe in the sporting event example? I definitely follow the use case, but I'd expect they'd essentially want EME for WebCodecs, reusing large parts of the existing EME infrastructure. |
After learning more about the motivations being "untrusted JS", I don't think we should pursue adding SFrame APIs to WebCodecs. JS trust is required for most of the web: authentication, email, banking, shopping, ... The platform increasingly ensures that trust at higher levels (https, cross origin isolation, ...) while exposing more and more power to applications. This runs counter to all of that. The trust isn't perfect, but domain specific (RTC, WebCodecs) solutions add complexity while leaving most of the problem unsolved. I'm all for E2E encryption, but having the app manage it. |
Agree that the goal is probably "EME for WebCodecs". But what does that imply for the format to be transferred over the wire and subsequently fed to WebCodecs decoder? Is protected content transferred the same way it is today? Is the proposal to allow WebCodecs decoder to decode that? If not, what is the alternative? I mentioned SFrame because that is a non-containerized format for encrypted frames. |
My first thought would be to provide the EME stuff that is usually in the container as part of the *DecoderConfig (e.g. these things). Then a chunk would be just as it is now, only encrypted. Also, to reuse most of EME, we might add MediaKeys as another member of *DecoderConfig. Flow is then
Then add the restrictions Dale talked about above. Disclaimer: very off the cuff design. |
Hey group, I want to follow up on the discussion from the WebRTC : Media TPAC call (minutes). @dontcallmedom mentioned the concern could be that the app itself is untrusted with the media (may snoop). Questions:
At this point I see the value of EME for streaming scenarios (sporting, gaming, ...), but I'm less clear on solutions for trust problems in communications use cases. @mwatson2 (and Richard?) mentioned that we may be able to register SFrame's encryption mechanism within EME. Sounds interesting. If we pursue EME:WC for streaming uses, my thought was to register a new stream type that re-used existing EME encryption modes (I guess CBC?) for maximum compatibility with existing infra. Does that sound right? I'm new to SFrame, but it looks like it uses the HMAC AEDA mode. If we additionally pursue EME:WC for RTC uses, maybe we'd want to support HMAC AEDA in addition to CBC? @fluffy mentioned that EME had been considered for RTC in the past, but found to be a poor fit. Something about the number keys? Can you say more? |
@chcunningham Existing EME has a stream format registry which describes the supported stream formats (currently ISO BMFF and WebM), specifically how the encryption is applied to the media bytes inside those different containers. Wheres existing EME works with MSE - which accepts media as a byte stream and thus we need a "stream format" specification - for EME with WebCodecs you would presumably need a "frame format" specification. This would describe for each frame format how the encryption would be applied to the bytes of the frame and - differently from the MSE case - any additional per frame metadata that needs to be applied to drive the decryption. I could imagine a frame format for SFrame and equally one for a Common Encryption frame. The latter would be useful for streaming applications that wanted use the same source files as existing MSE. I also find it hard to think of applications where EME would be useful for real-time communication (rather than streaming). Such an application would need the property that the sender wants some guarantees about what will be done with the media they are sending, the sender does not trust the application at the receiver but the sender does trust the CDM component at the receiver. Perhaps an "unrecordable" videoconference tool ? |
Mark said: "I could imagine a frame format for SFrame and equally one for a Common Encryption frame. The latter would be useful for streaming applications that wanted use the same source files as existing MSE." [BA] I'm trying to understand how "Common Encryption Frame" would differ from SFrame. Are there inherent differences in requirements that would lead to format differences? Or would CEF and SFrame be very similar, with the only major difference being the key management protocols used for different scenarios? In that situation, I'd suggest that only one of the formats is likely to be widely deployed. "I also find it hard to think of applications where EME would be useful for real-time communication (rather than streaming)." [BA] As noted in today's meeting, the use cases are blending together. In the "Together Mode" scenario, you have realtime streams ingested and combined with a low-latency sports stream. Since the goal is to produce a composited stream, it wouldn't make sense to E2E protect the realtime stream. However, it might make sense to protect the composited stream. "Such an application would need the property that the sender wants some guarantees about what will be done with the media they are sending, the sender does not trust the application at the receiver but the sender does trust the CDM component at the receiver. " [BA] Today very large realtime conferences are often implemented via a combination of "low latency ingestion" plus low-latency streaming. Think of a company meeting, a very large class, a concert for a mass audience, or an online political rally. For these kind of large meetings, the content can be considered valuable and vulnerable to theft, and even if there is no content fee, it might be important that the content not be modifiable so as to create "deep fakes". In these scenarios, the media uploader trusts the ingestor. Since the ingestion system may need to modify the content (e.g. transcode it or combine it with other streams), there is no need for content protection or E2E encryption on the ingestion leg. However, there is a desire to prevent theft or manipulation of the finished product, so you might have content protection on the downstream link. These scenarios can be implemented today using containerized media and transports such as WebTransport or RTCDataChannel, combined with MSE. So we're not really talking about new use cases or threat models. The question is how the wire format changes if WebCodecs is used instead of MSE. The goal is to transport an encrypted encoded chunk over the wire then decrypt it and decode it via WebCodecs. |
I'm not all that familiar with SFrame, but for Common Encryption, there are several different encryption schemes for each of AES-CTR and AES-CBC and each has some metadata which describes how the encryption is applied. For example, for the I imagine for SFrame there is a similar description for each cipher of one or more ways the cipher can be applied to the bytes of the frame and perhaps also metadata controlling that ? |
A link to a DASH-IF presentation describing Content Protection requirements for WebCodecs is here. |
After thinking about this, I don't think that there is any use case for supporting DRM in webcodecs for playback. Let me explain. If we are going to use WebCodecs with DRM for playback, we will need to use a VideoDecoder to extract the raw audio/video frame, then create a media stream track and playing it on a video element. As we want to use DRM, the decoder will have to decrypt the encrypted frame, but we can't output a raw frame to the media frame to JS, but we would need to pass an opaque handler instead. After that we would need to create the media stream track with the opaque frames for playing back in the video element. This is already available in MSE and I don't see any added value from WebCodecs for this use case. The only caveat is that the MSE use "containerized" frames while we want to provide "uncontainerized" ones. So wouldn't it just be simpler to extend MSE to accept Encoded(Audio/Video)Chunks instead? What functionality would be missing? |
@murillo128 Finding developers who require DRM support in WebCodecs is a bit like searching for a veterinarian properly trained to care for a unicorn. After a long (and unfruitful) search, you begin to wonder if they exist. There are several reasons why developers who were formerly using MSE (and containerization) have been moving to WebCodecs (and raw media transport). Most find that WebCodecs provides decreased latency, partly due to the removal of containerization/decontainerization operations as well as support for workers (which are also now supported by MSEv2). However, one of the other key characteristics of applications that have moved from MSE to WebCodecs is that they do not need DRM. So while I could speculate whether it makes sense for WebCodecs to support DRM or whether it would be better for MSEv2 to accept Encoded(Audio/Video)Chunks, it is probably best to wait until we "come across a unicorn that needs a veterinarian". Do you know of an application that needs both WebCodecs and DRM? |
🦄 present :) Jokes aside, if WebCodecs + DRM would exist, I can assure you there would be a market for it. The main reason we use WebCodecs over MSE/EME is not so much the decreased latency but the control it gives us over the rendering process. For our use case, an advanced multiview player, we need frame-level control over the rendering process, with each frame potentially requiring different shader parameters. A MSE-like approach where the output of a decoder is directly rendered to a view simply does not work in our case because of the lack of (control over) the synchronisation between the WebGL pass and the video decoder output. Which is the primary reason that so far we've been limited to ClearKey-like DRM schemes in browsers, while we can offer proper DRM in native applications |
There might be some gaps in my knowledge with MSE, EME and WebCodecs. So
please correct me if i'm wrong.
When using MSE in the past we ran into some issues with buffers. Different
browsers would implement their buffer rules differently. Which didn't
really matter until you got down to very low latency video.
whatwg/html#4638. Basically it came down to how
much it had to buffer before it could play. But there are other
scenarios to to deal with packet loss and etc as well. Where more direct
access to the buffer would be needed.
When WebCodecs were first announced, I thought it would be the answer to
that problem.
Microsoft has a media framework called Media Foundation. It's one of the
better video API's I've worked with. What you do with MF is you create a
buffer of IMFSamples. Each Sample points to a buffer of data that
represents a sample in an elementary stream. You set a presentation time
and a duration for the sample. The second part is a sink where you send
your data. The sink will dispatch an event asking for a frame and then you
respond to the event by calling ProcessSample with a reference to the
IMFSample. This way the entire buffer is managed, from when to start
playing, how to handle stalls and how to recover.
https://learn.microsoft.com/en-us/windows/win32/api/mfobjects/nn-mfobjects-imfsample
To initialize media to use the Protected Media Path you would set flags on
the IMFSample to signal the DRM used in the elementary stream.
https://learn.microsoft.com/en-us/windows/win32/medfound/sample-attributes
The second part is removing the need for containerization. I don't know if
this second one will actually be an issue. ISOBFF isn't really well
designed for live streaming. Depending on how things evolve with the MOQ
(Media Over Quic). It might become a point of friction.
So where should these changes live? I'm not sure. It's either making it
work with the lower level WebCodecs API or extending support for MSE and
EME.
…-nate
On Mon, Feb 6, 2023 at 2:50 PM Sergio Garcia Murillo < ***@***.***> wrote:
<External Email>
After thinking about this, I don't think that there is any use case for
supporting DRM in webcodecs for playback. Let me explain.
If we are going to use WebCodecs with DRM for playback, we will need to
use a VideoDecoder to extract the raw audio/video frame, then create a
media stream track and playing it on a video element. As we want to use
DRM, the decoder will have to decrypt the encrypted frame, but we can't
output a raw frame to the media frame to JS, but we would need to pass an
opaque handler instead. After that we would need to create the media stream
track with the opaque frames for playing back in the video element.
This is already available in MSE and I don't see any added value from
WebCodecs for this use case. The only caveat is that the MSE use
"containerized" frames while we want to provide "uncontainerized" ones.
So wouldn't it just be simpler to extend MSE to accept
Encoded(Audio/Video)Chunks instead?
What functionality would be missing?
—
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_w3c_webcodecs_issues_41-23issuecomment-2D1419657111&d=DwMCaQ&c=jGUuvAdBXp_VqQ6t0yah2g&r=i0rehb3rvq2O5PWG8d0CLeBNU-II2tZRMSNt-D7ChPM&m=tMbknq7FIo1anxZ-8O-SI062Dneysm1kkzXouyRCy4TKVfveQ5nQqz_Uj6Dr9Fyd&s=P8dVSBWPCW9QecuLP96DK8pgFS1Aq4q50GnQT_L4iZc&e=>,
or unsubscribe
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAC46WCYU2HPSFMEFIR5OATWWFIX5ANCNFSM4KT6ICIA&d=DwMCaQ&c=jGUuvAdBXp_VqQ6t0yah2g&r=i0rehb3rvq2O5PWG8d0CLeBNU-II2tZRMSNt-D7ChPM&m=tMbknq7FIo1anxZ-8O-SI062Dneysm1kkzXouyRCy4TKVfveQ5nQqz_Uj6Dr9Fyd&s=icuqNHQO1FLROaUEL6coZODhYotHsBN40kqzfD3sAWY&e=>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@rayvbr wouldn't your use case also require webgl to support DRM'd textures? |
@murillo128 yes, that's correct. So having EME/DRM support for WebCodecs would be a necessary although by itself not sufficient setp for enabling our use case. I added a more extensive description of our use case here: #483 |
I totally agree with @murillo128 - MSE/EME for non-containerized media makes more sense (better suited for DRM) than EME-for-WebCodecs. I'd use the latter for the lack of better options, although I'm not putting myself on the unicorn vets list as #483 clearly states WebRTC cases are out of the scope. At the same time the DRM for WebRTC topic is not getting much love (w3c/webrtc-nv-use-cases#86) even though there's definitely interest from DASH-IF. This is out of the scope of the discussion either, so I'll just wrap it up with stating that the second best (MSE/EME for non-containerized media) will be hugely helpful too. |
FWIW, I'm prototyping support for EME+MSE+WebCodecs in Chromium at: It's not something we're yet planning to ship, but I did want to flesh it out for folks to test. At this point, I've got clearkey working and tested (Widevine should work as well, but it's untested for now). It's available in Chrome 120.0.6074.0+ behind the The new IDL ends up looking like this so far:
|
This is an exciting development, thank you for making this happen! We (castLabs) discussed EME/DRM issues with Google folks (Chris, Matt and Harald to name a few) at length, but it did materialize in any tangible action, and then I learned both Chris and Matt were no longer with Google... This already looks very promising as a prototype, however I don't fully grasp the IDL you posted. I'd expect (based on my experience working with ISO/IEC 23001-7 compliant media) something like that:
'Segment' here is a range of consecutive frames sharing the same en/decryption key - if there's no key rotation, then segment is equivalent to the entire media stream. |
There's no concept of segments w/ decoders, so the client must map the segment config into a per-chunk config. |
This brings up the question I was going to ask before - what is "chunk"? I don't recall ever seeing "chunks" in any audio/video specs. Does that stand for something usually referred as "frame" or "access unit"? WebCodecs spec itself only defines "key chunks": "An encoded chunk that does not depend on any other frames for decoding. Also commonly referred to as a "key frame", and the whole spec feels like an arbitrary mix of "chunk" and "frame" used interchangeably |
The definition of "chunk" is codec-specific, see the codec registry https://www.w3.org/TR/webcodecs-codec-registry/. As an example, an AVC chunk is an Access Unit. |
I would like to understand how WebCodecs supports content protection. In WebRTC NV Uses Cases, we initially had a use case where Javascript could be trusted with keys used to encrypt or decrypt protected content. That use case was removed after the IESG took objection. So the question is how WebCodecs can address the only remaining use case (untrusted Javascript).
The text was updated successfully, but these errors were encountered: