How the model works ? Why Freckles and moles are lost ? #135
Replies: 3 comments 1 reply
-
The extension internally leverages InsightFace. This model employs a face embedding to represent the target face. This face embedding consists of points on the face (see image). As a result, it does not account for blemishes or wrinkles. This embedding is then provided to the model alongside the original image. The model then calculates how to transform the original image to match the embedding.
From this, we can deduce two things:
In other words, one might attempt to retrain a model using larger images, but as long as the model doesn't take in higher-resolution face embedding inputs, the outcome will not be more accurate, just higher in resolution. And the same poor result can be obtained by upscaling (what this extension does). The question we need to ask is: why does InsightFace operate in this manner? One reason, I believe, is that they already have efficient models for calculating embeddings (retinaface, ...). These models were originally designed for facial recognition, not for face swapping. This is why they do not take into account all facial features. By doing so, it allows for rapid embedding calculations and reduces dependence on a vast number of reference photos. Often, a single reference photo is sufficient to compute the embedding. The model they've made public is, in essence, more of a toy than anything else. It's a toy that performs well and can be enhanced with upscaling, but it's still a toy that doesn't produce highly ultra-credible results. I say that without taking any merit away from the work done by insightface team, which is really super cool. I might be mistaken, but I believe the model's architecture is a GAN (Generative Adversarial Network) with a generator and a discriminator. The public model is the generator (without the implementation). Essentially, this is conceptually similar to StyleGAN. Retraining such a model would be costly. One would need to determine the generator's architecture (which is relatively straightforward) and also the discriminator's (which is more tedious since it's not provided). Additionally, one would have to discern the training technique used, which isn't supplied either. However, the most significant challenge would be to create a face embedding model that genuinely represents the target face. This is no small feat. I don't know if the models insightface have developed for midjourney are better in this respect or if they're just better at upscaling the result. I haven't tested their bot on this aspect. It's very possible that other teams are working on a better representation of faces specifically for the face swapping task. Now, given the very touchy nature of the field, I doubt this kind of model will be widely disseminated before face swapping becomes commonplace (for video conf/compression for example). Another line of research would probably be to guide face generation in the manner of controlnet, by influencing the stable diffusion generation process. |
Beta Was this translation helpful? Give feedback.
-
thanks for the explanation, there are so many settings I thought I was doing something wrong. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation. Quite detailed and clear, it's super ! |
Beta Was this translation helpful? Give feedback.
-
Hi there,
freckles and moles are an important part of a face. Unfortunately right now this information is not considered, the faces are smooth with almost no skin imperfections. That could be irritating, if arms or other body parts display freckles.
Is it possible to keep such facial features when swapping faces? Maybe also scars and pimples?
By the way, kudos to Your work! Best extension ever 👍
Mark
Beta Was this translation helpful? Give feedback.
All reactions