-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Blur background of presenters camera #740
Comments
That's an interesting idea! However, implementing this is not trivial at all. It also likely makes Studio use even more resources (as in CPU & GPU), which is already a problem for some less powerful notebooks. We will definitely keep this in mind, but due to time constraints, funding and other reasons, we won't be able to implement this anytime soon. |
That is what I guessed. thanks for keeping it in mind for future development. |
A successful example for a possible implementation for virtual camera backgrounds or the blur effect is BigBlueButton, I guess. |
For building something like this, see Edit live video background with WebRTC and TensorFlow.js This probably also means that we would need to record the camera differently, though. We can no longer just record the media source. |
OK so I looked into this a bit now. Wall of text incoming. First the good news: There is now a prototype of this feature deployed here (branch (permalink); commit c5203de). This uses Which brings me to the bad news: The package ecosystem regarding this stuff is still rather barren, unfortunately. As far as sustainable open source turnkey solutions go, I could really only find two. Apart from the Shiguredo stuff, there is also Apart from that there are smaller projects by individual developers that are either unstable or long dead (the projects, not the devs (or so I hope)). The other thing you find when you research the relevant keyword ("virtual background") is SDKs that integrate with third party services like Twilio or EffectsSDK. In particular, these use server side processing, as far as I could gleam from looking at their stuff. So to me that basically means we have to do some of things required ourselves, and if you look at existing solutions, like the mentioned BBB, that's what people generally do, apparently. So the question becomes "how?" I researched that a bit as well and want to collect my findings here, but while doing so I also got an increasing sense of the question actually being "whether:" The reason why the SkyWay and Shiguredo stuff doesn't work in all browsers is that they use (different) new but not widely supported web APIs, namely Anyhow, if we were to build it ourselves, here's roughly what it would take: The problem basically has two parts: a) Image segmentation, i.e. recognizing what's in the foreground vs. the background and creating a mask from that, and b) blurring the camera stream according to that mask. Problem a) is a hard computer vision problem that we just won't solve ourselves from scratch. State of the art algorithms employ "AI" (read: machine learning, specifically convolutional neural networks AFAIK), and luckily there are pretrained models for it and even JavaScript packages that make it work in the browser (given a fast enough CPU and/or GPU). The most used solution here is Google's MediaPipe which offers several image segmentation models, most notably/notoriously the so called selfie segmentation. MediaPipe is a rather high level framework for applying ML stuff to media; a bit more low-level you find TensorFlow.js which "just" gives you the generic ML framework TensorFlow in the browser. With this, a few more models are (more or less easily) accessible. One that you also find a lot of references to is BodyPix (which seems to be superseded by this, though). Now, while these solutions solve the hard CV problem, they still leave a lot of engineering to us: These frameworks can run using different backends (on the client, not server backends), like the CPU vs. the GPU, there are WASM versions, etc., so integrating them and tuning them properly is still a bit of work. Problem b) on the other hand is a rather basic computer graphics problem. (Well, you can certainly go down the rabbit hole of fast, high quality blur algorithms, but a masked Gaussian blur on a 60FPS video is something that most consumer grade devices should be able to do these days and it looks just fine.) There aren't really any libraries specifically for it because it is a rather basic task. Higher level frameworks might have helpers for it but I wouldn't want to pull in a 3D engine to blur a video frame. The devil is again in the details, though: There is some research to be done as to what's the best way to a) extract the individual frames from the browser media APIs, probably rendering them to some canvas, b) applying the actual blur calculation hopefully in real time to then c) create a media stream from said canvas again. The most interesting step is b) where you again have the choice of using WebGL and/or WASM, or even built in browser features for blurring and composition if they are supported. I'll leave it at that for now, because our current funding for this sadly ran out anyway. Maybe this helps us down the line when people want to invest more in this; at the very least it gives us a basis to gauge how much further funding it would even take, and–again–whether or not it is even worth it. 🤷♀️ |
Like on MS Teams or Zoom users want to blur their camera background (for privacy reasons when they record lessons at home)
The text was updated successfully, but these errors were encountered: