Reproducibility with baal active learning | Hugging Face | Classification #250

nitish1295 · 2023-01-02T21:07:38Z

nitish1295
Jan 2, 2023

Consider the following scenario

Experiment Setting:

For active learning setting in Exp_A the data has 5 train samples and 20 pool samples. The validation set has 20 samples as well and is fixed for both the experiments I will describe below
The model initial weights and configs are fixed. This is a hugging face model and in both experiments we will start from the same model weights
We will evaluate the same set of metrics for both the experiments. Lets call them metrics_A and metrics_B

The 2 experiments are as follows:

Exp_A : You start your active learning loop, similar to what we have in the PR Baal in Production Notebook | Classification | NLP | Hugging Face #245. And you label 5 samples so now your train is 10 samples and pool in 15 samples. You reinitialize the weights and train/fine tune the model based on the 10 samples and evaluate you model on those 20 samples and get metrics_A. You also save your updated pool and train somewhere on the system, this is relevant for the next experiment
Exp_B - Now suppose you use the updated training data(read it from the system) at the start of this experiment, which means we use 10 samples to train the model(the same set of labelled samples which we have at the end of Exp_A) and the same 20 evaluation samples to get, we do this before we get to the active learning part. Along with the same weights for the initial model and get evaluation metrics metrics_B.

Ideally assert metrics_A == metrics_B should pass, but based on what I have tried this does not happen.

Few things I have tried:

Using my own evaluation function instead of trainer.eval()
Ensuring model weights are same during initialization for hugging face

I am reading some more about reproducibility in Pytorch at pytorch/pytorch#7068 and will try to add a gist to this discussion as well to show what is happening.

Q/A

Hopefully this does not happen due to patch_module since I think it is only responsible for changing dropouts mask while predictions, correct?

Wanted to check this since it might impact #247 since if we can't have that reproducibility then there are questions around the active learning process as suggested #247.

To be more precise the model.eval() we will get from the our jupyter notebook setup(which runs continuously, this is Exp_A) might be different from what we receive after we run an API call setup(since we are re-initializing the setup again but this time with the updated train data, this is Exp_B)

FYI also one of the reasons I have not merged my PR yet. Not sure why this is happening though.

Will try to solve this at my end(maybe I am doing something incorrectly). Please let me know if you have some suggestions

Dref360 · 2023-01-03T16:00:55Z

Dref360
Jan 3, 2023
Maintainer

I don't think it's due to #247, my hunch would be that the seed is not the same for both training?

Is metric_A quite different from metric_B?
Could you try setting the seed for numpy, random and torch before each training of A and B?

If you have a quick unit test for this I could debug more easily.

1 reply

nitish1295 Jan 3, 2023
Author

Thanks for checking. Maybe I was unclear, I meant that this scenario(if it exists) might impact #247 if we have API calls and reinitialize the active learning process each time with updated data. #247 is not the problem.

So the difference between metric_A and metric_B is about ± 3 to 5%.

I did set seeds but I will recheck and I will share a gist which will contain all the info. I am trying to see if I can resolve it at my end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility with baal active learning | Hugging Face | Classification #250

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Reproducibility with baal active learning | Hugging Face | Classification #250

nitish1295 Jan 2, 2023

Replies: 1 comment · 1 reply

Dref360 Jan 3, 2023 Maintainer

nitish1295 Jan 3, 2023 Author

nitish1295
Jan 2, 2023

Replies: 1 comment 1 reply

Dref360
Jan 3, 2023
Maintainer

nitish1295 Jan 3, 2023
Author