-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AccelerateMixin error with RandomizedSearchCV #944
Comments
Thanks for this great summary and code example. I don't have a multi-gpu setup that I can test this, with a single GPU I couldn't reproduce the error. But I have a suspicion that it could be due to caching. Could you please check if turning off caching solves the issue for you? To do that, initialize your net like this: model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
callbacks__valid_acc__use_caching=False, # <= added line
) Of course, if you have more scoring callbacks than the default ones, turn off caching for those too. If that doesn't help, please test disabling callbacks completely, using: model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
callbacks="disable",
) Please report your findings back.
Always happy to hear that :) |
Thanks for the quick reply. Unfortunately it did not solve the issue. However, the error trace changed.
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
callbacks__valid_acc__use_caching=False, # <= added line
) Error:
As you can see, the score is still NaN. At this point execution freezes and the RandomizedSearchCV fit does not terminate. Note that the second fold fit time is 0.0s.
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
callbacks="disable", # <= added line
) Error:
As you can see, the score is still NaN. At this point execution freezes and the RandomizedSearchCV fit does not terminate. Note that the second fold fit time is 0.0s.
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
) Error:
As you can see, the score is NaN but only the first fold completed. At this point execution freezes and the RandomizedSearchCV fit does not terminate. Thanks a lot in advance for your guidance |
Hmm, this does not look good. Whether the search fails early or works a while and fails later is probably not related to the specific conditions you posted but is caused by some combination of random hyper-parameters; since I'm sorry that I have to ask you to try a few more things, but as mentioned I cannot replicate this locally:
|
Thanks again for the reply. I implemented your suggestions (reproducibility by setting seeds & SoftMax). I also changed the default I also fitted the net without Full code: import torch
import numpy as np
from skorch import NeuralNetClassifier
from skorch.hf import AccelerateMixin
from accelerate import Accelerator
from sklearn.datasets import make_classification
from sklearn.model_selection import RandomizedSearchCV
import torch.nn as nn
import random
# FYI: Accelerate also requires the `transformers` packages from HuggingFace
# Reproducibility
SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)
random.seed(SEED)
# Generate data
X, y = make_classification(10_000, 100, n_informative=5, random_state=0)
X = X.astype(np.float32)
y = y.astype(np.int64)
# PyTorch module
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.dense0 = nn.Linear(100, 2)
self.nonlin = nn.Softmax(dim=-1)
def forward(self, X):
X = self.dense0(X)
X = self.nonlin(X)
return X
# Skorch wrapper
class AcceleratedNeuralNetClassifier(
AccelerateMixin,
NeuralNetClassifier
):
"""NeuralNetClassifier with HuggingFace Accelerate support"""
accelerator = Accelerator()
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
)
# HPO
rs = RandomizedSearchCV(
estimator=model,
param_distributions={
"lr": [0.0001, 0.001, 0.01, 0.1],
"batch_size": [10, 20, 30, 40],
},
n_iter=10,
scoring="average_precision",
n_jobs=1,
refit=False,
cv=2,
verbose=3,
random_state=SEED,
error_score="raise"
)
rs.fit(X, y)
print(f"{rs.cv_results_}")
Same trace:
Thanks a lot in advance for your time. |
Thanks for your detailed experiments. IIUC, all the conditions work, except for using accelerate together with Could you please do some more tests: # check cross_validate
from sklearn.model_selection import cross_validate
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
)
cross_validate(model, X, y)
# check cloning
from sklearn.base import clone
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
# also test with different hyper-parameter settings, esp. batch size
)
model_cloned = clone(model)
model_cloned.fit(X, y)
# checking joblib
from joblib import parallel_backend
backend = 'loky' # also test 'threading' and 'multiprocessing'
with parallel_backend(backend, n_jobs=1):
model = ... # check different hyper parames
model.fit(X, y) Can any of those conditions reproduce the error? I suspect it could be a weird interaction with joblib. I could ask the accelerate devs if they have ever seen anything like this. To do so, could you please give detailed info about your environment (hardware, OS, versions of all packages, Python, etc.)? |
Thanks for your reply.
Indeed.
from sklearn.model_selection import cross_validate
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
)
cross_validate(
model, X, y,
cv=2, scoring="average_precision", error_score="raise"
) Interestingly, it reproduces (almost) the same error. I am saying almost because the "inconsistent numbers of samples" are slightly different (
from sklearn.base import clone
for b_size in [10, 20, 30, 40]:
accelerator = Accelerator()
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
batch_size=b_size
)
model_cloned = clone(model)
model_cloned.fit(X, y) Training OK.
from joblib import parallel_backend
for backend in ['loky', 'threading', 'multiprocessing']:
print(f"\nUsing backend {backend}")
with parallel_backend(backend, n_jobs=1):
for b_size in [10, 20, 30, 40]:
accelerator = Accelerator()
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
batch_size=b_size
)
model.fit(X, y) Training OK
Many thanks in advance. |
Great, I'm asking colleagues, let's see if anything comes up. Meanwhile, two more things to test:
Probably you already tested that, but using, say,
The issue is almost certainly related to the two GPUs, since the same code runs fine with 1 GPU. Also, we have 10000 samples, with accelerator = Accelerator(device_placement=False)
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
device='cuda:0', # or 'cuda:1'
)
cross_validate(
model,
X,
y,
cv=2,
error_score="raise"
) |
No progress yet but something more to test: import copy
import torch
import torch.nn as nn
from accelerate import Accelerator
from sklearn.model_selection import KFold
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.dense0 = nn.Linear(100, 2)
self.nonlin = nn.LogSoftmax(dim=-1)
def forward(self, X):
X = self.dense0(X)
X = self.nonlin(X)
return X
X = torch.rand((10000, 100))
y = torch.randint(0, 2, size=(10000,))
model = MyModule()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
accelerator = Accelerator()
def accuracy(y_true, y_pred):
assert len(y_true) == len(y_pred)
return (y_true.cpu() == y_pred.cpu()).float().mean().item()
def _fit_and_score(model, accelerator, X_train, y_train, X_test, y_test, max_epochs=10):
model = copy.deepcopy(model)
accelerator = copy.deepcopy(accelerator)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
dataset_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(dataset_train, batch_size=10)
dataset_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(dataset_test, batch_size=10)
model, optimizer = accelerator.prepare(model, optimizer)
dataloader_train, dataloader_test = accelerator.prepare(dataloader_train, dataloader_test)
# training
model.train()
for epoch in range(max_epochs):
for source, targets in dataloader_train:
optimizer.zero_grad()
output = model(source)
loss = nn.functional.nll_loss(output, targets)
accelerator.backward(loss)
optimizer.step()
# validation
model.eval()
y_proba = []
losses = []
for source, targets in dataloader_test:
output = model(source)
loss = nn.functional.nll_loss(output, targets)
y_proba.append(output)
losses.append(loss)
print(len(y_proba), {len(batch) for batch in y_proba})
y_proba = torch.vstack(y_proba)
y_pred = y_proba.argmax(1)
print("test loss", (sum(losses) / len(losses)).item())
print("accuracy:", accuracy(y_test, y_pred))
# training without joblib
for idx_train, idx_test in KFold(2).split(X, y):
X_train, y_train = X[idx_train], y[idx_train]
X_test, y_test = X[idx_test], y[idx_test]
_fit_and_score(model, accelerator, X_train, y_train, X_test, y_test)
# training with joblib
from joblib import Parallel, delayed
parallel = Parallel(n_jobs=None, verbose=0, pre_dispatch='2*n_jobs')
parallel(
delayed(_fit_and_score)(
model,
accelerator,
X[idx_train], y[idx_train],
X[idx_test], y[idx_test],
)
for idx_train, idx_test in KFold(2).split(X, y)
)
# training with sklearn joblib
from sklearn.utils.parallel import Parallel, delayed
parallel = Parallel(n_jobs=None, verbose=0, pre_dispatch='2*n_jobs')
parallel(
delayed(_fit_and_score)(
model,
accelerator,
X[idx_train], y[idx_train],
X[idx_test], y[idx_test],
)
for idx_train, idx_test in KFold(2).split(X, y)
) The idea here is to try to remove as much "fluff" as possible in order to isolate the problem. So skorch is completely removed, and from |
Thanks for your reply.
Yes.
accelerator = Accelerator(device_placement=False)
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
device='cuda:0',
)
cross_validate(
model,
X,
y,
cv=2,
error_score="raise"
) I get a new error:
Also, this launches two processes on the same GPU, as if constraining
accelerator = Accelerator(device_placement=False)
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
device='cuda:0',
)
cross_validate(
model,
X,
y,
cv=2,
error_score="raise"
) Training OK. Note that this launches a single process on the GPU.
I reply in the next comment. |
# training without joblib
for idx_train, idx_test in KFold(2).split(X, y):
X_train, y_train = X[idx_train], y[idx_train]
X_test, y_test = X[idx_test], y[idx_test]
_fit_and_score(model, accelerator, X_train, y_train, X_test, y_test) Error is:
FYI
# training with joblib
from joblib import Parallel, delayed
parallel = Parallel(n_jobs=None, verbose=0, pre_dispatch='2*n_jobs')
parallel(
delayed(_fit_and_score)(
model,
accelerator,
X[idx_train], y[idx_train],
X[idx_test], y[idx_test],
)
for idx_train, idx_test in KFold(2).split(X, y)
) Error is the same:
FYI
# training with sklearn joblib
from sklearn.utils.parallel import Parallel, delayed
parallel = Parallel(n_jobs=None, verbose=0, pre_dispatch='2*n_jobs')
parallel(
delayed(_fit_and_score)(
model,
accelerator,
X[idx_train], y[idx_train],
X[idx_test], y[idx_test],
)
for idx_train, idx_test in KFold(2).split(X, y)
) Error is the same:
FYI Finally, when I change your assert len(y_true.cpu()) == len(y_pred.cpu()) I still get the same Many thanks in advance for your feedback. |
Thanks again, this is really helpful. Especially, since the first example without joblib already fails, that can't be the reason. This prompted me to look a bit more into the accelerate docs and I would like to test one more thing (sorry for the back and forth), name calling https://huggingface.co/docs/accelerate/quicktour#distributed-evaluation So IIUC, that means that in the evaluation part of In case this solves the issue, I would consider it a skorch bug. To quickly try a fix, you would need to subclass class MyAccelerateMixin(AccelerateMixin):
def evaluation_step(self, batch, training=False):
output = super().evaluation_step(batch, training=training)
return self.accelerator.gather_for_metrics(output) (or add this method This would be more of a quick and dirty hack, I would need to investigate further how to do this most efficiently. So if it works, do check that your code actually runs faster with accelerate than without. |
Good news! With a slight adaptation, for source, targets in dataloader_test:
outputs = model(source)
# outputs = accelerator.gather_for_metrics(outputs) <= initial suggestion
all_outputs, all_targets = accelerator.gather_for_metrics((outputs, targets)) # <= corrected
loss = nn.functional.nll_loss(all_outputs, all_targets)
y_proba.append(all_outputs)
losses.append(loss) Output:
Note that I also tried the Skorch adaptation you mentioned, but I think I am incorrectly implementing it. Full code: import torch
import numpy as np
from skorch import NeuralNetClassifier
from skorch.hf import AccelerateMixin
from accelerate import Accelerator
from sklearn.datasets import make_classification
import torch.nn as nn
import random
from sklearn.model_selection import cross_validate
from skorch.dataset import unpack_data
# Reproducibility
SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)
random.seed(SEED)
# Generate data
X, y = make_classification(10_000, 100, n_informative=5, random_state=SEED)
X = X.astype(np.float32)
y = y.astype(np.int64)
# PyTorch module
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.dense0 = nn.Linear(100, 2)
self.nonlin = nn.Softmax(dim=-1)
def forward(self, X):
X = self.dense0(X)
X = self.nonlin(X)
return X
# Skorch wrapper
class AcceleratedNeuralNetClassifier(
AccelerateMixin,
NeuralNetClassifier
):
"""NeuralNetClassifier with HuggingFace Accelerate support"""
# First attempt
# def evaluation_step(self, batch, training=False):
# output = super().evaluation_step(batch, training=training)
# return self.accelerator.gather_for_metrics(output)
# Second attempt
def evaluation_step(self, batch, training=False):
"""Perform a forward step to produce the output used for
prediction and scoring.
Preds and targets are gathered by the accelerator before return
"""
self.check_is_fitted()
Xi, targets = unpack_data(batch)
with torch.set_grad_enabled(training):
self._set_training(training)
y_infer = self.infer(Xi)
all_y_infer, all_targets = self.accelerator.gather_for_metrics((
y_infer,
targets
))
return all_y_infer
accelerator = Accelerator()
model = AcceleratedNeuralNetClassifier(
MyModule,
accelerator=accelerator,
)
cross_validate(
model, X, y,
cv=2, scoring="average_precision", error_score="raise"
) Both attempts produce the same error:
Thanks a lot in advance for your ideas. |
Good progress, I think we're getting close. Maybe I'll be able to get a multi-GPU setup to test soon. The error now seems to be:
I believe the reason is that Btw. the reason why in my code snippet, I only gathered the predictions, not the target, is that the target should not come from skorch. sklearn splits the data and the I'll think more about it or hopefully get to test it, but meanwhile, here are some suggested solutions:
Instead of overriding def infer(self, x, **fit_params):
y_infer = super().infer(x, **fit_params)
return self.accelerator.gather_for_metrics(y_infer) So try adding this method instead of overriding
That way, accelerate should not need to create dummy samples. E.g. for 10000 samples, batch size of 100 should work. However, this is quite annoying, especially if data is split into train/valid etc. (by default, skorch uses an 80/20 split). Depending on the size of the dataset, batching without remainder might not be possible (except for batch size of 1). This might also require passing
This could be unsafe, i.e. it could mean that the wrong samples are truncated, but maybe it works. Add this method to the custom neural net class: def forward(self, X, *args, **kwargs):
y_infer = super().forward(X, *args, **kwargs)
n = len(X)
is_multioutput = len(y_infer) > 0 and isinstance(y_infer[0], tuple)
if is_multioutput:
return tuple(yi[:n] for yi in y_infer)
return y_infer[:n] The
This is of course not nice because you want to make use of those GPUs, but at least training still seems to work fine. For this, it should be sufficient to not def get_iterator(self, dataset, training=False):
iterator = super().get_iterator(dataset, training=training)
if not training:
return iterator
iterator = self.accelerator.prepare(iterator)
return iterator |
I have a multi GPU instance now and can reproduce the error. Unfortunately, the solution does not work and it appears that the issue is that for some reason, accelerate does not detect that it should truncate excess samples. I'm investigating. |
Great to hear that you can try it for yourself. Thanks a lot for your time. |
Okay, so I managed to kinda track down the problem. To keep it quick, the Of course, it is still necessary to add the def evaluation_step(self, batch, training=False):
output = super().evaluation_step(batch, training=training)
return self.accelerator.gather_for_metrics(output)
def get_iterator(self, dataset, training=False):
iterator = super().get_iterator(dataset, training=training)
self.accelerator.gradient_state = iterator.gradient_state
return iterator Could you please check that this solves your problem? |
Update: I spoke an accelerate dev and the issue is most likely that sklearn sometimes creates a In the "no fluff" example I posted, I did add a So what does it mean for this specific issue? Unfortunately, there is no guarantee that you will get correct results, even if the hack I posted above removes the error. I would recommend not using accelerate in this context. Still, if you have 2 GPUs and the model is small enough that it can fit on each of them, it is possible to use grid search with skorch while leveraging both GPUs. This is documented here. Maybe that's a solution that can work for you. |
Many thanks for the analysis and suggested alternative. It is a pity. Do you think it is also unsafe to use Finally, do you think this deserves opening an issue on Anyway, thanks a lot for your help getting to the bottom of this and keep up the great work with this tool :) |
Potentially it's the same issue because of the copy being created. Whether this can still cause reference issues when only one GPU is involved, I don't know. The answer is probably "it depends". Interestingly, I did manage to find a potential solution by simply adding a class MyAccelerator(Accelerator):
def __deepcopy__(self, memo):
cls = type(self)
instance = cls() # <= add more arguments here if needed
return instance
# calling gather_for_metrics is still required
class MyNet(NeuralNetClassifier):
def evaluation_step(self, batch, training=False):
output = super().evaluation_step(batch, training=training)
return self.accelerator.gather_for_metrics(output)
accelerator = MyAccelerator()
net = MyNet(..., accelerator=accelerator)
cross_validate(net, ...) For my example, it worked. Maybe you can give it a spin for your real use case and report if the results look correct. I'll consult with the accelerate devs if this could be a viable solution. EDIT class MyAccelerator(Accelerator):
def __deepcopy__(self, memo):
return self Not sure if this can lead to trouble elsewhere down the line, but it works in my tests. |
Partly resolves #944 There is an issue with using skorch in a multi-GPU setting with accelerate. After some searching, it turns out there were two problems: 1. skorch did not call `accelerator.gather_for_metrics`, which resulted in `y_pred` not having the correct size. For more on this, consult the [accelerate docs](https://huggingface.co/docs/accelerate/quicktour#distributed-evaluation). 2. accelerate has an issue with beeing deepcopied, which happens for instance when using GridSearchCV. The problem is that some references get messed up, resulting in the GradientState of the accelerator instance and of the dataloader to diverge. Therefore, the accelerator did not "know" when the last batch was encountered and was thus unable to remove the dummy samples added for multi-GPU inference. The fix for 1. is provided in this PR. For 2., there is no solution in skorch, but a possible (maybe hacky) fix is suggested in the docs. The fix consists of writing a custom Accelerator class that overrides __deepcopy__ to just return self. I don't know enough about accelerate internals to determine if this is a safe solution or if it can cause more issues down the line, but it resolves the issue. Since reproducing this bug requires a multi-GPU setup and running the scripts with the accelerate launcher, it cannot be covered by normal unit tests. Instead, this PR adds two scripts to reproduce the issue. With the appropriate hardware, they can be used to check the solution.
EDIT: changed Thanks for your reply. I conducted some tests that seem conclusive. I compared running the following script on:
import torch
import torch.nn as nn
import numpy as np
import random
from skorch import NeuralNetClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from accelerate import Accelerator
from skorch.hf import AccelerateMixin
from sklearn.metrics import average_precision_score
# Reproducibility
SEED = 42
def seed_everything(seed=42):
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.dense0 = nn.Linear(100, 2)
self.nonlin = nn.Softmax(dim=-1)
def forward(self, X):
X = self.dense0(X)
X = self.nonlin(X)
return X
class AcceleratedNeuralNetClassifier(AccelerateMixin, NeuralNetClassifier):
def evaluation_step(self, batch, training=False):
output = super().evaluation_step(batch, training=training)
return self.accelerator.gather_for_metrics(output)
class SkorchAccelerator(Accelerator):
def __deepcopy__(self, memo):
return self
seed_everything()
X, y = make_classification(
1_000, 100,
n_informative=5, random_state=SEED, flip_y=0.1
)
X = X.astype(np.float32)
y = y.astype(np.int64)
accelerator = SkorchAccelerator()
for i in range(3):
seed_everything()
model_skorch = AcceleratedNeuralNetClassifier(
accelerator=accelerator, module=MyModule,
max_epochs=1, verbose=False, batch_size=10, callbacks="disable"
)
gs = GridSearchCV(
estimator=model_skorch,
param_grid={
"lr": [0.1, 0.001],
},
scoring="average_precision",
n_jobs=1,
cv=2,
verbose=0,
refit=False,
)
gs.fit(X, y)
if accelerator.is_local_main_process:
print(f"{gs.cv_results_['params']=}")
print(f"{gs.cv_results_['mean_test_score']=}")
# Manual refit
best_model_skorch = AcceleratedNeuralNetClassifier(
accelerator=accelerator, module=MyModule,
max_epochs=1, verbose=False, batch_size=10, callbacks="disable",
**gs.best_params_
)
best_model_skorch.fit(X, y)
preds = best_model_skorch.predict_proba(X)[: , 1]
score = average_precision_score(y, preds)
if accelerator.is_local_main_process:
print(f"{score=}")
print("-"*10)
Running ...
# accelerator = SkorchAccelerator()
...
# model_skorch = AcceleratedNeuralNetClassifier(
model_skorch = NeuralNetClassifier(
# accelerator=accelerator,
module=MyModule,
max_epochs=1, verbose=False, batch_size=10, callbacks="disable"
)
...
# if accelerator.is_local_main_process:
print(f"{gs.cv_results_['params']=}")
print(f"{gs.cv_results_['mean_test_score']=}")
...
# best_model_skorch = AcceleratedNeuralNetClassifier(
best_model_skorch = NeuralNetClassifier(
# accelerator=accelerator,
module=MyModule,
max_epochs=1, verbose=False, batch_size=10, callbacks="disable",
**gs.best_params_
)
...
# if accelerator.is_local_main_process:
print(f"{score=}")
print("-"*10) Output:
Running
Running
Seems reasonable to me. Thanks a lot. |
Thanks a lot for testing, the results look very reasonable. They're not 100% the same for 3 GPUs, but I think that's to be expected. I will update this thread if I get more feedback from accelerate devs. For now, I think we can close the issue but if you encounter new problems, feel free to re-open. |
Partly resolves #944 There is an issue with using skorch in a multi-GPU setting with accelerate. After some searching, it turns out there were two problems: 1. skorch did not call `accelerator.gather_for_metrics`, which resulted in `y_pred` not having the correct size. For more on this, consult the [accelerate docs](https://huggingface.co/docs/accelerate/quicktour#distributed-evaluation). 2. accelerate has an issue with beeing deepcopied, which happens for instance when using GridSearchCV. The problem is that some references get messed up, resulting in the GradientState of the accelerator instance and of the dataloader to diverge. Therefore, the accelerator did not "know" when the last batch was encountered and was thus unable to remove the dummy samples added for multi-GPU inference. The fix for 1. is provided in this PR. For 2., there is no solution in skorch, but a possible (maybe hacky) fix is suggested in the docs. The fix consists of writing a custom Accelerator class that overrides __deepcopy__ to just return self. I don't know enough about accelerate internals to determine if this is a safe solution or if it can cause more issues down the line, but it resolves the issue. Since reproducing this bug requires a multi-GPU setup and running the scripts with the accelerate launcher, it cannot be covered by normal unit tests. Instead, this PR adds two scripts to reproduce the issue. With the appropriate hardware, they can be used to check the solution.
Hi,
Thanks a lot for the great tool!
I tried the recently added HuggingFace Accelerate integration. I want to perform hyper-parameters optimization using Skorch with Accelerate + ScikitLearn RandomizedSearchCV.
However, it seems that they do not play nicely at scoring time by the RandomizedSearchCV.
Reproducible example named
skorch_accelerate_issue.py
:Accelerate config to run this script on 2 GPUs on the same machine:
I ran the code using:
accelerate launch skorch_accelerate_issue.py
And here is the error:
FYI, when training starts, I can see that the two GPUs are indeed occupied. Also, when I get rid of the RandomizedSearchCV and just perform
model.fit(X, y)
, training occurs as expected on 2 GPUs.Many thanks in advance for your help.
The text was updated successfully, but these errors were encountered: