-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory usage (RAM) - grow too fast #395
Comments
adding backend.clear_session() worked for me:
|
@Derdin-datascience Not work under mirrored strategy |
I am having this same issue 3 years later. |
Having Same problem. Kaggle offers only 13 GB ram with their gpu's so its i am hitting limit with wavenet. |
Running into the same issue. Currently using a very non elegant solution of running the script through a main script that basically loops through it like so: for i in range(5, 150, 3):
print(f"Running tuner.py with argument {i}")
subprocess.run(["python", "model/tuning/tuner.py", str(i)]) And then taking this argument as the max_trials parameter where the tuning happens. Hoping there is a solution for the memory leakage in the tuner! |
I cannot find a feasible solution right now either. Anyone who has an idea of how to fix it is welcome. Thanks! |
Has anyone found a working solution to this? Neither backend.clear_session() or gc.collect() worked for me... |
I was training 16 models in a loop and together they'd devour 500GB of memory on I went from: for i in range(y_train.shape[1]):
model = _build_model()
logger.info(f"Training model {i+1}")
model.fit(
x_train,
y_train[:, i],
validation_data=(x_test, y_tes[:, i]),
epochs=7,
batch_size=16,
callbacks=callbacks,
) to def _data_generator(x, y, batch_size):
for i in range(0, len(x), batch_size):
yield x[i:i + batch_size], y[i:i + batch_size]
for i in range(y_train.shape[1]):
logger.info(f"Creating model {i+1}")
model = _build_model()
model.fit(
_data_generator(x_train, y_train[:, i], 16),
validation_data=(x_test, y_test[:, i]),
epochs=7,
callbacks=callbacks,
steps_per_epoch=int(np.ceil(len(x_train_trf) / 16)),
) |
I did some memory profiling, and if you look in keras_tuner/src/engine/hypermodel.py at function fit, this is where memory starts leaking. def fit(self, hp, model, *args, **kwargs):
"""Train the model.
Args:
hp: HyperParameters.
model: `keras.Model` built in the `build()` function.
**kwargs: All arguments passed to `Tuner.search()` are in the
`kwargs` here. It always contains a `callbacks` argument, which
is a list of default Keras callback functions for model
checkpointing, tensorboard configuration, and other tuning
utilities. If `callbacks` is passed by the user from
`Tuner.search()`, these default callbacks will be appended to
the user provided list.
Returns:
A `History` object, which is the return value of `model.fit()`, a
dictionary, or a float.
If return a dictionary, it should be a dictionary of the metrics to
track. The keys are the metric names, which contains the
`objective` name. The values should be the metric values.
If return a float, it should be the `objective` value.
"""
return model.fit(*args, **kwargs) The model passed in here is the one you define using tensorflow. As such, I do not think this is an issue with keras_tuner, I think it might actually be a memory leak in tensorflow's model.fit. I tried deleting the model and that did not solve this issue, so I suspect there is leakage in this function somewhere. May also be related to custom train step? |
Until a proper fix is found, on #873 , I described how I made a very rough workaround:
|
Hello,
I see that Hyperband search is eating 1GB memory every 20 trials or so. Anything I can do?
Regards,
Trial 2
Trial 20
The text was updated successfully, but these errors were encountered: