-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing CEDR-KNRM results on ANTIQUE #20
Comments
Hi Stéphane, Unfortunately the
|
Hello Sean,
Is that very different from your setup? |
Our setup for running the experiment was:
So there are differences there. To rule out other possibilities, do you get the same results as reported for BM25? The version of Anserini in the repository was updated since OpenNIR was originally released. |
Executing the following commands:
I obtain the following results: Published results for BM25 are as follow:
So a couple of differences here too. |
It should be the initial commit: ca14dfa5e7... Note that you'll need to clear the |
Hello Sean, Today I cleaned up my The BM25 baseline produced the following results: Fine-tuning BERT produced the following results: Training CEDR-KNRM model (initialised using the newly fine-tuned BERT weights) produced the following results: Here I'm surprised to find out that CEDR-KNRM's performance is lower than the fine-tuned BERT's. On another subject, is there anyway to produce a human-readable version of the models' output? Thank you for all you help! |
Hmmm, fascinating! Thanks for running these tests. The BM25 discrepancies are puzzling, as well as the performance differences between Vanilla BERT and CEDR-KNRM. I'm out of ideas about what could cause these differences. The pipeline saves run files under |
Hello,
I'm trying to reproduce results from the OpenNIR paper using the Vanilla BERT and CEDR-KNRM models on the ANTIQUE dataset.
Taking my cues from the wsdm2020_demo.sh script, I trained my models as follow:
Which produced the following results:
test epoch=60 judged@10=0.6110 map_rel-3=0.2540 [mrr_rel-3=0.7288] p_rel-3@1=0.6450 p_rel-3@3=0.4917
However, published results for Vanilla BERT are as follow:
Which produced the following results:
test epoch=30 judged@10=0.6030 map_rel-3=0.2563 [mrr_rel-3=0.7302] p_rel-3@1=0.6400 p_rel-3@3=0.5083
However, published results for CEDR-KNRM are as follow:
According to the logs, I understand that the inference is deterministic (
[trainer:pairwise][DEBUG] using GPU (deterministic)
).Could anyone let me know what I am doing wrong?
Where does the differences come from (especially w.r.t. MAP)?
The text was updated successfully, but these errors were encountered: