memory usage (RAM) - grow too fast #395

viotemp1 · 2020-08-23T19:18:29Z

Hello,

I see that Hyperband search is eating 1GB memory every 20 trials or so. Anything I can do?
Regards,

Trial 2

Trial 20

Derdin-datascience · 2022-04-14T13:11:48Z

adding backend.clear_session() worked for me:

from keras import backend as backend
def model_builder(hp):
    backend.clear_session()
    model = Sequential()
    hp_drop = hp.Float('drop', min_value=0, max_value=0.2, step=0.025)
    model.add(Dense(128, activation = "relu"))
    model.add(Dropout(hp_drop))
    model.add(Dense(1, activation = "relu"))

    model.compile(
        loss='mean_absolute_error',
        optimizer=tf.keras.optimizers.Adam(0.001),
        metrics=["mean_absolute_percentage_error"]
    )
    return model

summelon · 2022-04-19T09:23:21Z

@Derdin-datascience Not work under mirrored strategy

JLPiper · 2023-03-08T19:49:32Z

I am having this same issue 3 years later.
As the tuner search progresses through each trial, more and more RAM is consumed until either an OOM error or the computer freezes entirely.
I have tried adding a clean_up function that runs at the start of every build_model function consisting of clear_session(), del model, and gc.collect() to no avail.
Has anyone found a reasonable fix for this yet?

h4ck4l1 · 2023-09-04T18:33:48Z

Having Same problem. Kaggle offers only 13 GB ram with their gpu's so its i am hitting limit with wavenet.
Has anyone found any solution?

Furkan-rgb · 2023-10-26T16:01:19Z

Running into the same issue. Currently using a very non elegant solution of running the script through a main script that basically loops through it like so:

    for i in range(5, 150, 3):
        print(f"Running tuner.py with argument {i}")
        subprocess.run(["python", "model/tuning/tuner.py", str(i)])

And then taking this argument as the max_trials parameter where the tuning happens. Hoping there is a solution for the memory leakage in the tuner!

haifeng-jin · 2023-10-26T20:42:54Z

I cannot find a feasible solution right now either. Anyone who has an idea of how to fix it is welcome.

Thanks!

OliverWeitman · 2023-12-27T13:27:15Z

Has anyone found a working solution to this? Neither backend.clear_session() or gc.collect() worked for me...

farhanhubble · 2024-01-17T09:19:03Z

I was training 16 models in a loop and together they'd devour 500GB of memory on tensorflow = "2.5.0"! There was definitely a leak somewhere. memory profiling did not help and I was sure that the leak was likely somewhere outside python code. I switched to using a generator that produces one mini batch of data at a time and that seems to have completely plugged the leak.

I went from:

for i in range(y_train.shape[1]):
    model = _build_model()
    logger.info(f"Training model {i+1}")
    model.fit(
        x_train,
        y_train[:, i],
        validation_data=(x_test, y_tes[:, i]),
        epochs=7,
        batch_size=16,
        callbacks=callbacks,
    )

to

def _data_generator(x, y, batch_size):
    for i in range(0, len(x), batch_size):
        yield x[i:i + batch_size], y[i:i + batch_size]


for i in range(y_train.shape[1]):
    logger.info(f"Creating model {i+1}")
    model = _build_model()
    model.fit(
        _data_generator(x_train, y_train[:, i], 16),
        validation_data=(x_test, y_test[:, i]),
        epochs=7,
        callbacks=callbacks,
        steps_per_epoch=int(np.ceil(len(x_train_trf) / 16)),
    )

jdkern11 · 2024-02-22T19:16:46Z

I did some memory profiling, and if you look in keras_tuner/src/engine/hypermodel.py at function fit, this is where memory starts leaking.

def fit(self, hp, model, *args, **kwargs):
        """Train the model.

        Args:
            hp: HyperParameters.
            model: `keras.Model` built in the `build()` function.
            **kwargs: All arguments passed to `Tuner.search()` are in the
                `kwargs` here. It always contains a `callbacks` argument, which
                is a list of default Keras callback functions for model
                checkpointing, tensorboard configuration, and other tuning
                utilities. If `callbacks` is passed by the user from
                `Tuner.search()`, these default callbacks will be appended to
                the user provided list.

        Returns:
            A `History` object, which is the return value of `model.fit()`, a
            dictionary, or a float.

            If return a dictionary, it should be a dictionary of the metrics to
            track. The keys are the metric names, which contains the
            `objective` name. The values should be the metric values.

            If return a float, it should be the `objective` value.
        """
        return model.fit(*args, **kwargs)

The model passed in here is the one you define using tensorflow. As such, I do not think this is an issue with keras_tuner, I think it might actually be a memory leak in tensorflow's model.fit.

I tried deleting the model and that did not solve this issue, so I suspect there is leakage in this function somewhere. May also be related to custom train step?

JLPiper · 2024-02-22T19:56:42Z

Until a proper fix is found, on #873 , I described how I made a very rough workaround:

A. Use a less intensive hyper-parameter search option that can feasibly complete its search before the memory consumption becomes too much. I found switching from Bayesian to Hyperband gave me a lot more leeway at the cost of the benefits of using Bayesian.

B. Use a separate program to launch, monitor, and kill the main tuner program. I found a handful of libraries that let you track the resource usage relatively easily. Simply have that program launch the tuner, wait until the resource usage passes a certain threshold, and have it kill the tuner when the usage passes the threshold.

Keras tuners naturally save its progress, so it will pick up right where it left off. A word of warning though, even then I have run into occasions where a single tuning step consumes too much memory by itself and gets stuck as the secondary program kills it before it can finish processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory usage (RAM) - grow too fast #395

memory usage (RAM) - grow too fast #395

viotemp1 commented Aug 23, 2020

Derdin-datascience commented Apr 14, 2022

summelon commented Apr 19, 2022

JLPiper commented Mar 8, 2023 •

edited

Loading

h4ck4l1 commented Sep 4, 2023

Furkan-rgb commented Oct 26, 2023

haifeng-jin commented Oct 26, 2023

OliverWeitman commented Dec 27, 2023

farhanhubble commented Jan 17, 2024 •

edited

Loading

jdkern11 commented Feb 22, 2024 •

edited

Loading

JLPiper commented Feb 22, 2024 •

edited

Loading

memory usage (RAM) - grow too fast #395

memory usage (RAM) - grow too fast #395

Comments

viotemp1 commented Aug 23, 2020

Derdin-datascience commented Apr 14, 2022

summelon commented Apr 19, 2022

JLPiper commented Mar 8, 2023 • edited Loading

h4ck4l1 commented Sep 4, 2023

Furkan-rgb commented Oct 26, 2023

haifeng-jin commented Oct 26, 2023

OliverWeitman commented Dec 27, 2023

farhanhubble commented Jan 17, 2024 • edited Loading

jdkern11 commented Feb 22, 2024 • edited Loading

JLPiper commented Feb 22, 2024 • edited Loading

JLPiper commented Mar 8, 2023 •

edited

Loading

farhanhubble commented Jan 17, 2024 •

edited

Loading

jdkern11 commented Feb 22, 2024 •

edited

Loading

JLPiper commented Feb 22, 2024 •

edited

Loading