Which f1 should we report? #25

soodeh-nilforoushan · 2021-12-11T00:43:41Z

When I run the code I got three f1 from different epochs. Which f1 should we report as a final f1 accuracy based on the paper?
this is the example of out put: epoch 5: dev_f1=0.8317046688382194, f1=0.818146568437379, best_f1=0.8185719859539602

rinkstiekema · 2023-01-02T16:08:34Z

The dev_f1 uses the validation dataset for evaluation, while f1 uses the test dataset. Lastly, best_f1 indicates the best f1 score evaluated against the test dataset.

Eventually, the model that is written to disk is simply the last checkpoint. Since you should report the f1 of this model, it's best to use f1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which f1 should we report? #25

Which f1 should we report? #25

soodeh-nilforoushan commented Dec 11, 2021

rinkstiekema commented Jan 2, 2023

Which f1 should we report? #25

Which f1 should we report? #25

Comments

soodeh-nilforoushan commented Dec 11, 2021

rinkstiekema commented Jan 2, 2023