the overlapped identities between LFW and ms1m #24

zhenglaizhang · 2018-01-30T09:39:03Z

Awesome work!
As I know, there are some overlapped identities between LFW and ms1m, does the clean list has removed the overlapped identities, this may affect the performance on LFW

azat-d · 2018-01-30T09:55:30Z

Also, there are some overlapped identities between facescrub and ms1m. I downloaded from the freebase the correspondence between MIDs and real names. Please check the attachment
mid_to_name.txt.zip
UPD: Aaron Eckhart has an identifier m.03t4cz. This person is present in both the test and the training sets. Obviously, there are other such persons.
UPD2: m.04wp3s:Sam Rockwell, m.014zfs:Bill Cosby, m.02h3tp:Patrick Swayze, etc - all these identities are both in training and test sets (I've just checked it manually, I believe that there is more than 50% of the intersection.)

nttstar · 2018-01-30T10:19:34Z

We're doing such experiment and will be available in our paper soon, slightly worse I think(<0.1).
We have already removed 500+ identities from ms1m by checking the similarity between facescrub and ms1m. Please see src/data/dataset_merge.py if you want to know how we remove overlaps.

azat-d · 2018-01-30T10:48:55Z

I just wrote a script that checks for matches between test persons (subset of facescrub that used in MegaFace challenge) and persons from the training set (your cleaned ms1m list). There are 54/80 persons that are both in training and test sets:
Stana_Katic m.0fd6sd
Farrah_Fawcett m.01j851
Sam_Rockwell m.04wp3s
Alec_Baldwin m.018ygt
Christopher_Reeve m.0jrny
James_Remar m.05mlqj
Brendan_Fraser m.0227tr
Brianna_Brown m.0gdvdh
Andrea_Bowen m.05dxl5
Tempestt_Bledsoe m.014yqb
Paul_Bettany m.01chc7
Robert_Redford m.0gs1_
Mark_Wahlberg m.0gy6z9
Sarah_Hyland m.0523pz4
Alley_Mills m.0d_3hq
Kit_Harington m.09v4hnq
Victoria_Justice m.07w71b
Robert_Duvall m.015c4g
Edie_Falco m.01dy7j
Peggy_McCay m.05j0x1
Jeremy_Irons m.016ywr
Rebecca_Budig m.03jtgb
Brad_Garrett m.01rcmg
Bill_Cosby m.014zfs
Christel_Khalil m.0719hb
Lindsay_Hartley m.04w9ky
Joanna_Kerns m.0403xb
Emile_Hirsch m.05mkhs
Christine_Lakin m.06wr68
Marilu_Henner m.02pzx7
James_Marsden m.042ly5
Justin_Timberlake m.0j1yf
Adam_Brody m.0214df
Patrick_Swayze m.02h3tp
John_Malkovich m.017r13
Melina_Kanakaredes m.02pbhg
Nadia_Bjorlin m.04vpr3
Ryan_Phillippe m.01ksr1
Fran_Drescher m.01s3kv
Norman_Reedus m.0bs6hr
Robert_Knepper m.07v7p6
Didi_Conn m.04tvm2
Bobbie_Eakes m.03s_t9
Heath_Ledger m.0237fw
Summer_Glau m.039g0_
Emily_Deschanel m.03vd_l
Orlando_Bloom m.09wj5
Daniel_Day-Lewis m.016yvw
Shia_LaBeouf m.04w391
Kimberlin_Brown m.03ff8f
Adrienne_Barbeau m.01z7nj
Dean_Cain m.02qjj7
Erin_Cummings m.063z0nr
Joaquin_Phoenix m.018db8

nttstar · 2018-01-30T11:07:46Z

@azat-d I think it is also very difficult to find ALL overlaps by names matching.

azat-d · 2018-01-30T11:14:55Z

Agree. But according to my test there are at least 67.5% overlap. I don't trust to any results that are based on celebrity datasets. The most reliable test is NIST FRVT test, which is free for all researchers.

nttstar · 2018-01-30T11:17:44Z

@azat-d I have removed 500+ identities from MS1M by comparing with facescrub dataset, to test MegaFace. By reference, facescrub have only 530 identities in total. I believe our result is quite reliable.

azat-d · 2018-01-30T11:20:00Z

Megaface test use only 80 identities from facescrub. And checked YOURS train list against those identities.

azat-d · 2018-01-30T11:21:31Z

And I've found that 54/80 identities are both in test and in yours training set.

azat-d · 2018-01-30T11:26:34Z

I'm talking about this https://pan.baidu.com/s/1eTn6O62 training set

azat-d · 2018-01-30T11:32:21Z

Do you mean that there was additional cleaning of this list?

nttstar · 2018-01-30T11:32:42Z

500+ identities were removed in my binary packed dataset, not this clean list. You can check it in our paper and there's about 0.3% performance drop(98.3% -> 98.0%)
You need to generate features for all 530 identities if you want to upload the result, 80 identities is only required by set-1.

azat-d · 2018-01-30T11:41:57Z

Ok, thank you!

zhenglaizhang · 2018-01-31T01:48:27Z

So great to hear that the results about overlapping identities removing, thank you guys, I will also take a look at this then, may update if any new results here.

zhenglaizhang · 2018-02-01T03:49:59Z

closing as this is well discussed here.

zhenglaizhang closed this as completed Feb 1, 2018

glorioushedgehog mentioned this issue Aug 13, 2018

Is there any training / verification overlap? #331

Closed

lymanblue mentioned this issue Feb 27, 2019

Does MS1M Arcface or iBug exclude the overlapped data from LFW #566

Closed

FlyingAle mentioned this issue Jun 2, 2023

2d106det convert to ncnn model,run on android will be crushed #2322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the overlapped identities between LFW and ms1m #24

the overlapped identities between LFW and ms1m #24

zhenglaizhang commented Jan 30, 2018

azat-d commented Jan 30, 2018 •

edited

Loading

nttstar commented Jan 30, 2018

azat-d commented Jan 30, 2018 •

edited

Loading

nttstar commented Jan 30, 2018

azat-d commented Jan 30, 2018

nttstar commented Jan 30, 2018

azat-d commented Jan 30, 2018

azat-d commented Jan 30, 2018

azat-d commented Jan 30, 2018

azat-d commented Jan 30, 2018

nttstar commented Jan 30, 2018 •

edited

Loading

azat-d commented Jan 30, 2018

zhenglaizhang commented Jan 31, 2018

zhenglaizhang commented Feb 1, 2018

the overlapped identities between LFW and ms1m #24

the overlapped identities between LFW and ms1m #24

Comments

zhenglaizhang commented Jan 30, 2018

azat-d commented Jan 30, 2018 • edited Loading

nttstar commented Jan 30, 2018

azat-d commented Jan 30, 2018 • edited Loading

nttstar commented Jan 30, 2018

azat-d commented Jan 30, 2018

nttstar commented Jan 30, 2018

azat-d commented Jan 30, 2018

azat-d commented Jan 30, 2018

azat-d commented Jan 30, 2018

azat-d commented Jan 30, 2018

nttstar commented Jan 30, 2018 • edited Loading

azat-d commented Jan 30, 2018

zhenglaizhang commented Jan 31, 2018

zhenglaizhang commented Feb 1, 2018

azat-d commented Jan 30, 2018 •

edited

Loading

azat-d commented Jan 30, 2018 •

edited

Loading

nttstar commented Jan 30, 2018 •

edited

Loading