SEI is defined as User Data message in video bitstream. we can use it to transmit data with video content.
In these years, interactions are frequently used in live activities. Such as shop tickets and Livestreaming quiz and so on. The most convenient way to keep activities sync with livestream is to use SEI (Supplemental Enhancement Information) of H.264.
When we use MSE to play livestream, it's easy to get access to H.264 NAL units by demux live video format. but when we use iOS Safari or WebView on iOS, we can only use <video src="{HLS_URL}">
to play live stream. So there is no chance that we can get access to the raw content of live stream (for example, SEI).
Therefore, We propose a new video sei
event to solve this problem.
Sometimes during the live activity, we'd have some interaction with audiences, for example: subtitles, Face recognition based stickers, question forms or goods for sale. Those information and SEI can be both produced with a NTP timestamp, so that we can know when to render it at the Webapp side.
End to end delay is important to measure the live experience, especially for outside live and e-commerce livestreaming, in which scenario broadcasters concern about the feedback from The audience. so end to end dalay is an key indicator to measure the CDN quality.
In video conference senario, there would be more than 1 speakers, when the conference is recorded as an video file, we want to replay it to get some information, and to know who is talking, and other user information of the speaker, we want use the SEI to get the information, and render it synchronously with the video track.
Some times the realtime body or face recongnization is not efficent for Web browsers, in particular for those old devices or mobile phones. So we need the server side to run the Algorithm and put the information into the SEI.
Get SEI information from web video, with loose accurate timestamp information so we can use it to sync with video.currentTime
.
Can be used with Media Source Extensions™ (w3.org), parse the livestream by a JavaScript demuxer and a remuxer, to get the fmp4 stream with SEI data. And SEI event would be triggered when Web Video parsing the H.264 NALs.
Can be used with both WebCodecs API and Media Source Extensions™ (w3.org) , demux live stream and generate EncodedVideoChunks
, and pass it to SourceBuffer
directly.
We can also use WebCodecs to process coded video frames to get the SEI information, a relevant issue is: here
A new SEI event structure is defined as follow:
interface SEIEvent: Event {
type: 'sei';
mediaTime: number;
byteLength: number;
copyTo: (dest: Uint8Array) => void
};
we can receive this event when video element has parsed SEI information from video bitstream. So we can deal with the event by its attributes and functions.
- mediaTime: The media presentation timestamp (PTS) in seconds of the frame presented (e.g. its timestamp on the video.currentTime timeline)
- byteLength: Length of SEI payloaded data in byte
- copyTo: copy SEI data to a typed array to process it.
let seiList = [];
function parseSEI () {
// parse pre defined SEI structure
}
function renderSEI () {
// render SEI content
}
video.addEventListener('sei', (e) => {
const seiData = new Uint8Array(e.byteLength);
e.sei.copyTo(seiData);
seiList.push({
data: parseSEI(seiData),
timestamp: e.timestamp
})
})
video.addEventListener('timeupdate', e => {
const curTime = e.target.currentTime;
renderSEI(curTime, seiList);
})
The rvfc is a callback triggered when video element had rendered a frame. we can get know of the mediaTime of the frame, to render a accurate SEI data.
function draw(now, metadata) {
const mediaTime = metadata.mediaTime;
renderSEI(mediaTime, seiList);
video.requestVideoFrameCallback(draw);
}
video.requestVideoFrameCallback(draw);
If you are using EME for encrypted media source playback, SEI data may not be accessible, because EME module only emits decoded video frame, not AVC samples
We get the sei timestamp when parsing the AVC bitstream, but when rendering, it's difficult to sync SEI with the exact frame. So if you want to render SEI information and concern about the synchronization between video frame and SEI, we suggest you to carry the SEI data only using Keyframe.
As some application inject SEI to every frame, event and callback is not a good way to access SEI, or it would block the javascript main thread.
DataCue is a proposed web API to allow support for timed metadata, i.e., metadata information that is synchronized to audio or video media.
We can use Datacue to handle the SEI information,When video parsed a SEI nal unit, it would generate a DataCue and add it to textTrack.
Here is an example for application to deal with the SEI cue from a video element.
const cueEnterHandler = (event) => {
const cue = event.target;
console.log('cueEnter', cue.startTime, cue.endTime);
};
const cueExitHandler = (event) => {
const cue = event.target;
console.log('cueExit', cue.startTime, cue.endTime);
};
const addCueHandler = (event) => {
const cue = event.cue;
cue.onenter = cueEnterhandler;
cue.onexit = cueExitHandler;
};
const video = document.getElementById('video');
video.textTracks.addEventListener('addtrack', (event) => {
const textTrack = event.track;
if (textTrack.kind === 'metadata') {
textTrack.mode = 'hidden';
textTrack.addEventListener('addcue', addCueHandler);
}
});
When you want to know the all SEI cues in video timeline, you can use activeCues to access them:
const metadataTrack = getMetaDataTrack(video)
const activeCues = metadataTrack.activeCues;
const seiCues = activeCues.filter(cue => cue.type === 'org.mpeg.sei') // Type used here need to be updated
A cue should be generated with a non-zero duration, when used for SEI information, the startTime and the endTime may be the same. As far as I have tested on Safari, that wouldn't be an error, but it's also need to be considered when other platform wants to implement DataCue API.
What proposal to use is depends on the frequency of SEI message and and the accuracy you want.
If you want to use SEI message in a per-frame frequency, DataCue is good for you, and no need to listen to the addcue event, use textTrack.activeCues is good for you.
If you want to render SEI with the exact frame, maybe you can use the WebCodecs so you can control the rendering. Or you can use sei event with rvfc to reduce the margin of unsync to 1 or 2 frame. If you can accept the error of 200 to 300ms