iBUG_DeepInsight #49

d4nst · 2018-02-13T19:00:59Z

I have seen that the current top algorithm in the MegaFace challenge is iBug_DeepInsight, with an accuracy that corresponds with your latest update: 2018.02.13: We achieved state-of-the-art performance on MegaFace-Challenge-1, at 98.06

After reading your paper and the README in this repo, it seems to me that this accuracy is achieved using the cleaned/refined MegaFace dataset. Is this correct?

nttstar · 2018-02-14T01:06:03Z

Right.

d4nst · 2018-02-14T09:22:37Z

In that case, I don't think it's fair to publish those results in the public results page of the MegaFace challenge. As far as I know, it's not allowed to modify the evaluation set when reporting results, as this would obviously make impossible to compare the accuracy of the different algorithms. I suggest that you report the results obtained using the original dataset instead.

nttstar · 2018-02-14T13:22:50Z

I can not agree with you.
First, it is impossible to guarantee any submitted result is fair unless it has published paper and open source code. I can have dozens of methods to cheat on it.
Second, we followed all rules of MegaFace challenges but corrected the errors they made on distractor images. What we report is the real performance of any face recognition algorithm we had experimented. The accuracy can be random if we do not remove these distractor noises as we seen in our paper, which is actually not fair to compare the performance between two models/algorithms.
Last, we made all things clear and open source, compared all published state-of-the-art algorithms then demonstrated that our approach performed the best.

ghost · 2018-02-14T18:35:28Z

It's worth to mention to be fair you should publish this cleaned data just to let other researchers to validate it.

d4nst · 2018-02-14T18:37:56Z

Don't get my wrong, I think it's great that you made everything clear and open source and as a fellow researcher I thank you for that!

As you say, it's very easy to cheat on these type of challenges. However, the assumption is that all the teams would work with the same test set and would not tamper the submitted results. Otherwise, as I said before, it would be impossible to compare the accuracy between algorithms. I agree with you that the best way to test is with a clean dataset and your paper clearly shows how this affect the accuracy of the algorithm. My problem with your submission is that nowhere in the MegaFace website says that you used a cleaned test set. Neither there is a link to this repo or to your paper, where this information is provided.

In my opinion, you should let the organisers know about this, so they can decide what to do. I think adding a note under the "Method details" section in the MegaFace website could work as well.

ghost · 2018-02-14T18:40:51Z

I think this affects the companies and the business of face technologies which cost us a huge budget, to be fair I believe you should delete this repository.

d4nst · 2018-02-14T18:42:54Z

@MartinDione I'm not sure what are you talking about but that has nothing to do with this discussion

ghost · 2018-02-14T18:47:57Z

I'm talking about the megaface objectives to develop the facial recognition. When you publish such this code you affect other companies, like Vocord they invested a lot of money on the development to achieve state-of-the-art performance on MegaFace.

d4nst · 2018-02-14T18:53:08Z

I won't even bother replying to that... Again, that has nothing to do with the issue we are discussing here.

nttstar · 2018-02-15T07:31:38Z

We will put the noise list on this repo soon. But read the notes carefully at that time.

nttstar · 2018-02-15T07:33:54Z

@MartinDione It is unbelievable if vocord spend a lot of money on a public competition like MegaFace.

ivazhu · 2018-02-21T15:16:08Z

@MartinDione Vocord didn't spend it :)
@nttstar And what have you done with errors in FaceScrub subset which is used in Megaface Challenge?

nttstar · 2018-02-22T01:59:22Z

It was described in our paper.

ivazhu · 2018-02-22T13:13:17Z

@nttstar I have read your article and now I absolutely disagree that you are playing fair game. First of all you have changed test dataset, not deleted smth wrong, but changed FaceScrub "wrong" images with another images! More over you are writing that additional artificial features was added to your feature vectors that highlight "bad" images in distractor dataset. It is the same as using manual annotation! Megaface rules deny the first and the second. Also I would like to mention that your method of "cleaning" dataset creates "clear for your algorithm" dataset. I am sure that community will find mistakes in your "error" list when you publish it.

P.S. And have you thought that Megaface team have clear correspondence list? If they will recompute the results with it your team will take the last place cause images of different people will give you very high FAR (it is about your image replacing)

nttstar · 2018-02-22T15:46:56Z

@ivazhu I'm confused about the words 'deleted' and 'changed' in your comment. Anyway what you said was almost the same with d4nst's and I don't want to clear my position again. Megaface team had checked our list and solution, otherwise the result would not be on the leaderboard.

ivazhu · 2018-02-22T21:13:22Z

@nttstar From your article: "During testing, we change the noisy face to another right face" and "During testing, we add one additional feature dimension to distinguish these noisy faces"
From Megaface Challenge: "1. Download MegaFace and FaceScrub datasets and development kit 2.Run your algorithm to produce features for both datasets"

In your article you declare that you are not using provided dataset and augmenting features with manual labels. It's obvious for all that you broke the Megaface rules.

nttstar · 2018-02-28T07:55:20Z

@ivazhu How can you achieve 91% without removing these noises? It's beyond my imagination.

chichan01 · 2018-02-28T09:43:38Z

@nttstar
I have read your paper and also your noise list and codes under https://github.com/deepinsight/insightface/tree/master/src/megaface as well. I got a bit confuse.

In Your text, "We manually clean the FaceScrub dataset and finally
find 605 noisy face images. During testing, we change the
noisy face to another right face, which can increase the identification
accuracy by about 1%. In Figure 6(b), we give the
noisy face image examples from the MegaFace distractors.
All of the four face images from the MegaFace distractors
are Alec Baldwin. We manually clean the MegaFace distractors
and finally find 707 noisy face images. During testing,
we add one additional feature dimension to distinguish
these noisy faces, which can increase the identification accuracy
by about 15%."

In your noise list, megaface_noises.txt has 719 noisy face images and Facescrub has 605 noisy face images.
In remove_noises.py,
for facescrub set, the noisy image(feature) is replaced by the subject class center with random uniform noise. Do you really need random noise there? why?

your code for remove noise in facescrub set:
center = fname2center[a]
g = np.zeros( (feature_dim+feature_ext,), dtype=np.float32)
g2 = np.random.uniform(-0.001, 0.001, (feature_dim,))
g[0:feature_dim] = g2
f = center+g
_norm=np.linalg.norm(f)
f /= norm
feature_path_out = os.path.join(args.facescrub_feature_dir_out, a, "%s%s.bin"%(b, out_algo))
write_bin(feature_path_out, f)

However, for Megaface set, I don't know what you do in there. My first reading seems that you try to fill the feature with 100 for those noise images, but after I read your load_bin function, it is not the case as you update those filled 100 feature with the original extracted feature from the noise image.

your code about noise in megaface:
feature = load_bin(feature_path, 100.0)
write_bin(feature_path_out, feature)

and load_bin function:
def load_bin(path, fill = 0.0):
with open(path, 'rb') as f:
bb = f.read(44)
#print(len(bb))
v = struct.unpack('4i', bb)
#print(v[0])
bb = f.read(v[0]4)
v = struct.unpack("%df"%(v[0]), bb)
feature = np.full( (feature_dim+feature_ext,), fill, dtype=np.float32)
feature[0:feature_dim] = v
#feature = np.array( v, dtype=np.float32)
#print(feature.shape)
#print(np.linalg.norm(feature))
return feature

@ivazhu @nttstar Is there something I misunderstand? please give an advice. (it seems that your code is not what you describe in the paper.!)
did you use your codes and list to reproduce your result for megaface or it is typo error? if you have update your code or lists, would you like to tell us what is your updated result on megaface. Please verify it on your pretrained model.

ivazhu · 2018-02-28T10:31:56Z

@nttstar First of all - WE DIDN'T CHANGE THE DATASET as you did. There are some secrets :) For instance, think what to do if you see more than one face on "error" distractor image.

And also, as I promised, take a look on Alley_Mills_52029, Lindsay_Hartley_33188, Michael_Landes_43643, ... These are not ERRORS. These are errors of your algorithm. In your "work" you simply deleted all samples which your alg was not working correctly on.

Any more questions?

chichan01 · 2018-02-28T10:41:37Z

@ivazhu ,
What do mean "more than one face on "error" distractor image"? In verification, it is a pair matching. also you do not know where image is from distractor or gallery(facescrub).

ivazhu · 2018-02-28T10:47:03Z

It can be more than one face on a image ср, 28 февр. 2018 г. в 11:41, Chi Ho CHAN <notifications@github.com>:

…

@ivazhu <https://github.com/ivazhu> , What do mean "more than one face on "error" distractor image"? In verification, it is a pair matching, how can it be more than a face? Also what do you mean "one face on "error" distractor image"? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHEGNeCkA4gdC53pVDicmjULn-fPGks5tZS1jgaJpZM4SEPY7> .

chichan01 · 2018-02-28T10:50:03Z

Do you mean that there can be more than a face in the image, no matter it is in facescrub subset or distractor set?
So you do not use their provided json as a landmark reference in your case.

ivazhu · 2018-02-28T10:55:15Z

We didn't anything with Facescrub dataset cause there is not FAIR method to correct this type of errors. What about distractors - yes, there are some samples with more than one face - one face from Facescrub and one another. About megaface json - read megaface docs - you should use megaface json in the case you can't detect a face only ср, 28 февр. 2018 г. в 11:50, Chi Ho CHAN <notifications@github.com>:

…

Do you mean that there can be more than a face in the image, no matter it is in facescrub subset or distractor set? So you do not use their provided json as a landmark reference in your case. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHCu6Z7Q4ZS3izuVLAuj6mhg0nCPfks5tZS9dgaJpZM4SEPY7> .

chichan01 · 2018-02-28T11:06:10Z

I see. I think this can be a discussion. Originally, I thought json files is to tell participant where is a face in an image, so that alg. computes the similarity score of those faces (not images). Perhaps, some face locations in json files are incorrect, but this is the groundtruth of this challenge and therefore we should based on these error to provide our score.

Anyway, your trick may also not be good as you only do on mageface dataset, which means that any images more than one face can be denoted as disaster. In other words, you have already known the side information that one of your pair is disaster. so you can do whatever to make it as a good score for mismatch pair. I think this also violates the verification protocol as alg. should make a score for a testing pair without having any side information.
Finally, is it what you did on your submitted work on megaface?

nttstar · 2018-02-28T11:48:26Z

@ivazhu Choosing one face from multiple faces in one image according to your own knowledge is also a data trick. It is the same as data noises cleaning.

nttstar · 2018-02-28T11:50:19Z

@chichan01 Adding random noises to centre vectors can avoid identical feature vectors. The result will not change if no noise applied.

chichan01 · 2018-02-28T11:51:08Z

@nttstar
so would you like to tell me what you want to do on the disaster set..?

ivazhu · 2018-02-28T15:31:44Z

I wonder how then images from facescrub would be compared! P.S. It is very funny to hear about 100% from the man who added a dimension with labels to his features ср, 28 февр. 2018 г. в 16:25, Jia Guo <notifications@github.com>:

…

@ivazhu <https://github.com/ivazhu> I can get 100% on Megaface challenge by using your approach - to detect dozens of faces from each megaface distractor image and choose the one with lowest similarity to facescrub dataset, even sometimes it's not a real face. You can't use the knowledge of facescrub dataset when you're doing something with megaface. I'm very surprised that you and vocord do think this is a normal procedure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHBpwgLiIYSIXOBxpv8hF9ChKuarrks5tZW8QgaJpZM4SEPY7> .

d4nst · 2018-02-28T18:37:40Z

I think this whole discussion proves my point. Both of your teams (Deepinsight and Vocord) have not strictly followed the MegaFace protocol, so it is pointless to compare the performance of your algorithms with the rest of participants.

ivazhu · 2018-02-28T18:42:35Z

Daniel, please show where I didn't follow the protocol? ср, 28 февр. 2018 г. в 19:37, Daniel Saez <notifications@github.com>:

…

I think this whole discussion proves my point. Both of your teams (Deepinsight and Vocord) have not strictly followed the MegaFace protocol, so it is pointless to compare the performance of your algorithms with the rest of participants. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHK46ermo9v2k_Z5y6JwTy-dfXu4_ks5tZZz3gaJpZM4SEPY7> .

d4nst · 2018-02-28T19:06:24Z

@nttstar and @chichan01 have already explained this, but I'll try to make it more clear...

As I understand, this is what you do: in the FaceScrub set, you always crop the correct face (using the provided landmarks or bounding box as a reference). In the MegaFace distractor set, you are cropping all the detected faces, comparing them against the probe face and selecting the lowest score as the "valid" score.

The problem with your approach is that you are using your knowledge about the origin of the image (probe set or distractor set) to make a decision. You know that a probe and a distractor image shouldn't match, so you just take the lowest score. As others have pointed out, you could use a poor face detector that doesn't even detect faces and this approach would give you a very high accuracy.

If you really wanted to take the lowest score, you should do it all the time, not just for distractor images, i.e. when you add the matching face from the probe set to the distractor set, you should also compare against all the detected faces and take the lowest score. If you do that, your performance will probably be much worse.

Lastly, just think about a real identification system in which you don't know anything about the origin of the faces. I'm sure that you would agree with me that always selecting the lowest score from all the detected faces would be a very poor design.

Please let me know if my assumptions about your approach are wrong.

ivazhu · 2018-02-28T19:52:16Z

Not quite. First of all I compute detector settings so that there is no fail detections in FaceScrub dataset. It let me hope that there is no such detections in Megaface dataset. To tell the truth I checked it manually - there is no fail detections in Megaface with computed settings. So I just choose the real distractor from two real face. There is another more important problem - there are images without faces in Megaface dataset. Nevertheless we had to compute "features" by json-coordinate. These "features" are something completely different from face feature. I'd like to mention that the better learned algorithm than worse results would be on such samples. P.S. Daniel, we don't talk about real identification system here. There are very different problem there :) 2018-02-28 22:06 GMT+03:00 Daniel Saez <notifications@github.com>:

…

@nttstar <https://github.com/nttstar> and @chichan01 <https://github.com/chichan01> have already explained this, but I'll try to make it more clear... As I understand, this is what you do: in the FaceScrub set, you always crop the correct face (using the provided landmarks or bounding box as a reference). In the MegaFace distractor set, you are cropping all the detected faces, comparing them against the probe face and selecting the lowest score as the "valid" score. The problem with your approach is that you are using your knowledge about the origin of the image (probe set or distractor set) to make a decision. You know that a probe and a distractor image shouldn't match, so you just take the lowest score. As others have pointed out, you could use a poor face detector that doesn't even detect faces and this approach would give you a very high accuracy. If you really wanted to take the lowest score, you should do it all the time, not just for distractor images, i.e. when you add the matching face from the probe set to the distractor set, you should also compare against all the detected faces and take the lowest score. If you do that, your performance will probably be much worse. Lastly, just think about a real identification system in which you don't know anything about the origin of the faces. I'm sure that you would agree with me that always selecting the lowest score from all the detected faces would be a very poor design. Please let me know if my assumptions about your approach are wrong. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHF5Aa6xp95x8XK4_QjrfEl11I9pKks5tZaO2gaJpZM4SEPY7> .

chichan01 · 2018-02-28T23:41:19Z

@nttstar I have just found that you also released your result on FGNet megaface challenge as well. I have questions regarding on your result.

Do you only use the same training set (i.e. your provided MS-1M-celeb) to train the deep network and test on both Facescrub and FGNet megaface challenges?
If yes, it means that I can verify your pretrained model on both challenges. would you mind to tell me which pretrained model you released for testing on FGNet.?
did you also clean the testing dataset as well? if yes, would you mind to release the list?
will you update your paper by including the results of FGNet?

chichan01 · 2018-02-28T23:56:08Z

@ivazhu
Would you mind to address what different problems between the megaface challenge and the real identification and verification here??

My point and others @d4nst and @nttstar agree that you only treat the disaster set specially and this is not the case in the real scenario as we don't have any side information of the image pair.
I agree that @nttstar result cannot be comparable with other works because other participants did not do that, but they give the list, some codes and pretrained models, so that we can regard that they propose a new protocol and also verify their work. Therefore, the former and the later participants can do a bit extra work to follow this new protocol to produce the result if they want to.

On the other hand, your work will be much more difficult to follow and reproduce as you do not release things. Luckily, you pop up here so that I understand a bit of your work. Also, the most important is that your proposed trick violates the fundamental principle of biometric verification and identification.

happynear · 2018-03-01T01:30:18Z

It is a common knowledge in face recognition community that the absolute performance of MegaFace is meaningless. Lots of cheating tricks can be applied to achieve very high scores and MegaFace don't have the mechanism to prevent them.

What we can trust are the relative scores only, which means, one can only compare with himself on MegaFace. As in this issue, models evaluated on the cleaned list should only be compared with the ones evaluated on the same list. The authors have these experiments in their paper. That's already enough, not to mention that they released their codes and will release the cleaned list.

The only problem is that the official MegaFace organizer should make two leaderboards after they got aware of the existence of the "cleaned list". However, they didn't have the willings to do so and they chose to put them on the same leaderboard. This is the problem. The authors of InsightFace did nothing wrong.

HaoLiuHust · 2018-03-01T02:12:41Z

appreciate Vocoord let us know what they have done to the dataset

d4nst · 2018-03-01T08:22:01Z

@ivazhu the point of my comment was not about the failed detections. It was about treating probe and distractor images in a different way when in a real setting you wouldn't have this information.

nttstar · 2018-03-01T08:28:27Z

@chichan01

Yes. The refined MS1M
Yes. By using resnet100 model. You can get similar performance.
Some mis-aligned faces are rectified and some potential wrong labels are checked. However, on FGNET, the lablel noise has smaller affect on the performance compared to that on FaceScrub. Due to the age span, noisy label selection is hard, and J Deng is confirming some of these noise labels. You can send email to enquiry with him.
Maybe. We may update the results on FGNET into the paper.

nttstar · 2018-03-01T08:34:44Z

@d4nst
As you can see from this thread, I believe lots of submitted results were using cleaned list implicitly or explicitly, especially those >85% acc. Some teams even do not realize it.

d4nst · 2018-03-01T08:58:51Z

@nttstar Yes, I have realised that. However, that doesn't make things any better. The MegaFace leaderboard seems pretty meaningless to me now. We need a reliable and standard benchmark similar to ImageNet in the face recognition community

ivazhu · 2018-03-01T09:32:49Z

Daniel, see the megaface forum - I have suggested it yet. Moreover we offered the megaface team to clean up their dataset two years ago. Also guy from google wrote that he informed them about the errors at the beginning of the challenge. I suggest to discuss new challenge at some big conference. There are a lot of problems to create fair for all competitions. чт, 1 марта 2018 г. в 9:58, Daniel Saez <notifications@github.com>:

…

@nttstar <https://github.com/nttstar> Yes, I have realised that. However, that doesn't make things any better. The MegaFace leaderboard seems pretty meaningless to me now. We need a reliable benchmark similar to ImageNet in the face recognition community — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHC--_Eq_3vkKqVH59-Su_YrJ9X_sks5tZ7hOgaJpZM4SEPY7> .

nttstar · 2018-03-01T13:31:36Z

Another big issue I want to say here is: Test set(megaface distractor set) cleaning is a MUST-DO work to test your algorithms/models, otherwise the results will not be solid even you just compare the relative scores. Take an example from our paper. SphereFace(m=4, lambda=5) achieves 82.95%, 97.43% on 'before-cleaning' and 'after-cleaning' protocols respectively, while ArcFace(m=0.4) gets 82.29% and 98.10%(ArcFace(m=0.5) gets 98.36% for reference). Watching the scores before refinement we may judge that SphereFace is better than ArcFace under m=0.4 setting but actually not.

ivazhu · 2018-03-01T13:36:36Z

Jia, it may be due to your cleaning process and its errors. Except errors in your error list there are some "bad" (not corresponding) images that you didn't catch. For example: Peggy_McCay_48889 Adrienne_Barbeau_4246 Alley_Mills_52003 So I can say that you didn't prove the statement above 2018-03-01 16:31 GMT+03:00 Jia Guo <notifications@github.com>:

…

Another big issue I want to say here is: Test set(megaface distractor set) cleaning is a MUST-DO work to test your algorithms/models, otherwise the results will not be solid even you just compare the relative scores. Take an example from our paper. SphereFace(m=4, lambda=5) achieves 82.95%, 97.43% on 'before-cleaning' and 'after-cleaning' protocols respectively, while ArcFace(m=0.4) gets 82.29% and 98.10%(ArcFace(m=0.5) gets 98.36% for reference). Watching the scores before refinement we may judge that SphereFace is better than ArcFace under m=0.4 setting but actually not. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHHvscaBNce3qwc-oTKNa6MYdWSzsks5tZ_g6gaJpZM4SEPY7> .

nttstar · 2018-03-01T13:41:30Z

FaceScrub errors are all fixed in my experiments, it does not affect. Also, we have only about 1% acc improvement after fixing FaceScrub identity errors, much less than megaface's. It is not worth to mention here.

ivazhu · 2018-03-01T13:46:48Z

*Peggy_McCay_48889* *Adrienne_Barbeau_4246* *Alley_Mills_52003* These are FaceScrub errors. Did you fix them? If yes, why your error list doesn't contain them? 2018-03-01 16:41 GMT+03:00 Jia Guo <notifications@github.com>:

…

FaceScrub errors are all fixed in my experiments, it does not affect. Also, we have only about 1% acc improvement after fixing FaceScrub identity errors, much less than megaface's. It is not worth to mention here. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHMZS3_zIxB52s2lp1hWMgdH38X5dks5tZ_qMgaJpZM4SEPY7> .

nttstar · 2018-03-01T13:54:20Z

You can open a standalone issue to describe it. Noisy list must be refined over and over, not just from our team.
And this three items will not lead to much differences(<0.1% I believe, as 600 facescrub noises just increase the acc by about 1%). I don't think it will break my argument above.

ivazhu · 2018-03-01T14:04:36Z

1. These three names are only examples of your errors 2. There are only 3530 images from FaceScrub in Megaface protocol. And I think (I know the real number) )the intersection with 600 "facescrub noises" is about 20-30 images. So even three noises (and I repeat that there are more such cases) are very important. 3. Jia, you deleted several not noise images in your experiments. After that your "big issue" can be explained by these mistakenly deleted images. 2018-03-01 16:54 GMT+03:00 Jia Guo <notifications@github.com>:

…

You can open a standalone issue to describe it. Noisy list must be refined over and over, not just from our team. And this three items will not lead to much differences(<0.1% I believe, as 600 facescrub noises just increase the acc by about 1%). I don't think it will break my argument above. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHB3hjOeAzciIua5t4a3yCIlppI4dks5tZ_2PgaJpZM4SEPY7> .

nttstar · 2018-03-01T14:11:43Z

@ivazhu I'm wasting my time to reply to you. From +0.66% to -0.67%, so you mean that your three items can affect the acc by larger than 1% in one direction? Stop please.

ivazhu · 2018-03-01T14:21:46Z

Jia, I think you are in the subject but it seems you are not. Think how many positives pairs (pairs of images of the same person) you have. After that look how many images of *Peggy_McCay *there are in the Facescrub subset from Megaface. Even if there are only one not detected by you error then there are N positives pairs which are not positive. But you managed to miss more than one error *Peggy_McCay *image! 2018-03-01 17:11 GMT+03:00 Jia Guo <notifications@github.com>:

…

@ivazhu <https://github.com/ivazhu> I'm wasting my time to reply to you. From +0.66% to -0.67%, so you mean that your three items can affect the acc by larger than 1% in one direction? Stop please. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHDNJFJT4zUEYVNhW6LOQWeXVJKtFks5taAGhgaJpZM4SEPY7> .

nttstar · 2018-03-01T14:42:59Z

I will give you the results soon.
EDIT:
SphereFace got 97.60% while ArcFace(m=0.4) got 98.28% after adding these three.
The conclusion is still the same. Thanks for sharing this.
What I said above is my point may be false iff SphereFace can obtain more than 1% improvement while ArcFace keeps the same by adding that three, which is hardly to happen.

terencezl · 2018-03-18T19:03:14Z

I can attest to the statement by @nttstar that denoising the facescrub probe set only changes the final score 1%. Out of the 3530 probes, 24 show up on the noise list from @nttstar, and 24 is a small portion of 3530. More importantly, if you check those pictures, they are mostly still celebrities, and should not find matches with any distractors. In other words, you end up with less score increase than what 24/3530 might indicate.

The reason why 707 mislabeld distractors play such a huge influence on the final score, is that you are not comparing 707 with 1M distractors, you are comparing them with 3530 probes. If any of the mislabeled distractors gets into a higher similarity than the probe's similarity, your rank 1 score is ruined. In that case, rank 5, 10 might be a better metric, because that allows for some noises. But megaface's devkit only prints out rank 1 score. Oh well.

terencezl · 2018-03-18T19:07:16Z

As you can see from this thread, I believe lots of submitted results were using cleaned list implicitly or explicitly, especially those >85% acc. Some teams even do not realize it.

It's a very good point. Having a public noise list and letting everyone examine, and follow suit is the right way to go. If someone is organizing a new contest, I think the organizer should have the following rules:

Ideally each pic contains 1 face. Remove wrongly labeled ones. For pics containing more than 1 face, make sure the provided face bbox corresponds to the correct face, and urge participants to use that bbox to generate landmarks, align and generate reps.

Urge participants to use the exact same procedure for images from probe and distractor sets.

Liuftvafas · 2018-04-11T07:39:44Z

What do you guys think of NIST's FRVT Ongoing https://www.nist.gov/programs-projects/face-recognition-vendor-test-frvt-ongoing for a benchmark? Many commercial vendors use it for quite a fair comparison of their face recognition algorithms. FRVT requires to use face detection and does not provide landmarks but I guess MTCNN does a decent job in solving this issue.

ivazhu · 2018-04-11T08:11:21Z

Hi, Justas The problem is that FRVT requires your algorithm without any kind of protection. You give your alg to american government and they use it whereever they want. And another one for you - the recognition error for best algs on FRVT is several times less than MTCNN detection error. Ivan 2018-04-11 10:39 GMT+03:00 Justas Kranauskas <notifications@github.com>:

…

What do you guys think of NIST's FRVT Ongoing https://www.nist.gov/programs-projects/face-recognition- vendor-test-frvt-ongoing for a benchmark? Many commercial vendors use it for quite a fair comparison of their face recognition algorithms. FRVT requires to use face detection and does not provide landmarks but I guess MTCNN does a decent job in solving this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFqfHHWtxNa411O2ERKNtTlczFeD0Cv4ks5tnbNEgaJpZM4SEPY7> .

delveintodetail · 2018-04-27T03:06:51Z

With the non-cleaned megaface and facescrub, all results above > ~85-86% are cheating in some ways(willful or unmeant).

With the cleaned megaface, evaluations are unfair to other methods that are conducted on previous non-cleaned protocol, such as sphereface, tencentface, and papers before Mar.

Angular Margins are very tricky, you can always mess up methods from others and pretend your method performs better. If possible, do not introduce more parameters, because parameters are always fit to the training datasets, if small datasets, no strong constraints used, if larger datasets, maybe stronger constraints, who knows??

Report all your results trained on CASIA, MS-Celebrities, VGGFace2, because most previous methods are trained on CASIA.

That is no fair in face recognition area. Probably, blind test and protocol may be ok.

Megaface protocol sucks.......... Let us try something else...........

nttstar closed this as completed Feb 14, 2018

nttstar reopened this Feb 14, 2018

nttstar closed this as completed Feb 16, 2018

evamaxfield mentioned this issue Feb 15, 2022

Start Lit Review for "Controversial Datasets" WeberLab-UW/project-tracking#7

Closed

FlyingAle mentioned this issue Jun 2, 2023

2d106det convert to ncnn model,run on android will be crushed #2322

Open

iBUG_DeepInsight #49

iBUG_DeepInsight #49

Comments

d4nst commented Feb 13, 2018 • edited Loading

nttstar commented Feb 14, 2018

d4nst commented Feb 14, 2018

nttstar commented Feb 14, 2018

ghost commented Feb 14, 2018

d4nst commented Feb 14, 2018 • edited Loading

ghost commented Feb 14, 2018

d4nst commented Feb 14, 2018

ghost commented Feb 14, 2018

d4nst commented Feb 14, 2018

nttstar commented Feb 15, 2018

nttstar commented Feb 15, 2018

ivazhu commented Feb 21, 2018

nttstar commented Feb 22, 2018

ivazhu commented Feb 22, 2018

nttstar commented Feb 22, 2018

ivazhu commented Feb 22, 2018

nttstar commented Feb 28, 2018

chichan01 commented Feb 28, 2018 • edited Loading

your code about noise in megaface: feature = load_bin(feature_path, 100.0) write_bin(feature_path_out, feature)

ivazhu commented Feb 28, 2018

chichan01 commented Feb 28, 2018 • edited Loading

ivazhu commented Feb 28, 2018 via email

chichan01 commented Feb 28, 2018

ivazhu commented Feb 28, 2018 via email

chichan01 commented Feb 28, 2018 • edited Loading

nttstar commented Feb 28, 2018

nttstar commented Feb 28, 2018

chichan01 commented Feb 28, 2018

ivazhu commented Feb 28, 2018 via email

d4nst commented Feb 28, 2018

ivazhu commented Feb 28, 2018 via email

d4nst commented Feb 28, 2018

ivazhu commented Feb 28, 2018 via email

chichan01 commented Feb 28, 2018 • edited Loading

chichan01 commented Feb 28, 2018 • edited Loading

happynear commented Mar 1, 2018 • edited Loading

HaoLiuHust commented Mar 1, 2018

d4nst commented Mar 1, 2018

nttstar commented Mar 1, 2018

nttstar commented Mar 1, 2018

d4nst commented Mar 1, 2018 • edited Loading

ivazhu commented Mar 1, 2018 via email

nttstar commented Mar 1, 2018

ivazhu commented Mar 1, 2018 via email

nttstar commented Mar 1, 2018

ivazhu commented Mar 1, 2018 via email

nttstar commented Mar 1, 2018

ivazhu commented Mar 1, 2018 via email

nttstar commented Mar 1, 2018

ivazhu commented Mar 1, 2018 via email

nttstar commented Mar 1, 2018 • edited Loading

terencezl commented Mar 18, 2018

terencezl commented Mar 18, 2018

Liuftvafas commented Apr 11, 2018

ivazhu commented Apr 11, 2018 via email

delveintodetail commented Apr 27, 2018 • edited Loading

d4nst commented Feb 13, 2018 •

edited

Loading

d4nst commented Feb 14, 2018 •

edited

Loading

chichan01 commented Feb 28, 2018 •

edited

Loading

your code about noise in megaface:
feature = load_bin(feature_path, 100.0)
write_bin(feature_path_out, feature)

chichan01 commented Feb 28, 2018 •

edited

Loading

chichan01 commented Feb 28, 2018 •

edited

Loading

chichan01 commented Feb 28, 2018 •

edited

Loading

chichan01 commented Feb 28, 2018 •

edited

Loading

happynear commented Mar 1, 2018 •

edited

Loading

d4nst commented Mar 1, 2018 •

edited

Loading

nttstar commented Mar 1, 2018 •

edited

Loading

delveintodetail commented Apr 27, 2018 •

edited

Loading