Stop tensorflow from eating all GPU memory #473

mattdangerw · 2023-07-13T23:26:28Z

By default, tensorflow will consume all available GPU memory: https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth

Say you are running on KERAS_BACKEND=jax, and jax and tf have both been configured with GPU suport:

keras_core will always import and initialize tensorflow
tensorflow will use all available GPU memory
jax will attempt any GPU allocation and immediately fail

Note this does not happen in colab because colab automatically exports the environment variable:
TF_FORCE_GPU_ALLOW_GROWTH=true

From keras-core, we can attempt to work around it by limiting tensorflow GPU growth. Long term we should work around it by not importing tensorflow on jax and torch backends.

fchollet · 2023-07-14T02:08:51Z

From keras-core, we can attempt to work around it by limiting tensorflow GPU growth. Long term we should work around it by not importing tensorflow on jax and torch backends.

This is not quite what we are doing. We are making it possible to use Keras without having installed TF -- but if it installed, then it is likely to get imported during the normal course of operations with another backend, because:

We still use gfile when available, e.g. when saving
KPLs still rely on TF

Hypothetically we could fix 1 (not easy though) but not 2.

fchollet · 2023-07-14T02:53:45Z

keras_core/backend/__init__.py

+
+    gpus = tf.config.list_physical_devices("GPU")
+    if gpus:
+        # Stop tensorflow from using all avilable GPU memory. See


Typo: available

fchollet · 2023-07-14T02:55:06Z

keras_core/backend/__init__.py

@@ -6,6 +6,21 @@
    # upon import.
    import torch

+if backend() != "tensorflow":
+    import tensorflow as tf


As a general policy we should only import TF when requested. Right now it gets imported lazily the first time it's needed, in utils/module_utils.py. We should customize the initialize() method for TF to insert this routine.

Hmm, I think in this case, we may be better off switching this to just setting os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true" in config.py. This is essentially how colab handles the problem.

The issue with tf become a delayed import is that a user script might import tensorflow first, not use the lazy model, and lead to OOMs outside of our control. Setting the environment variable is lightweight, we can do it first thing, and it will not affect tf until it's imported. What do you think?

Hmm, I think in this case, we may be better off switching this to just setting

SGTM. Lightweight indeed and the Colab precedent shows it's fine.

To note, I have made the changes described above -- now we only import TF if using KPL or if saving to GCS. But many users will likely import it anyway (e.g. tf.data + JAX)

By default, tensorflow will consume all available GPU memory: https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth Say you are running on KERAS_BACKEND=jax, and jax and tf have both been configured with GPU suport: - keras_core will always import and initialize tensorflow - tensorflow will use all available GPU memory - jax will attempt any GPU allocation and immediately fail Note this does not happen in colab because colab automatically exports the environment variable: TF_FORCE_GPU_ALLOW_GROWTH=true We can do the same from keras-core.

mattdangerw requested a review from fchollet July 13, 2023 23:26

mattdangerw force-pushed the memory-growth-tf branch 2 times, most recently from a2b17b2 to fa85eb3 Compare July 14, 2023 00:20

fchollet reviewed Jul 14, 2023

View reviewed changes

mattdangerw force-pushed the memory-growth-tf branch from fa85eb3 to de4296c Compare July 16, 2023 04:27

fchollet approved these changes Jul 16, 2023

View reviewed changes

fchollet merged commit 02c7f69 into keras-team:main Jul 16, 2023

awsaf49 mentioned this pull request Aug 27, 2023

Add MultipleChoice example with Keras Core and Keras NLP #791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop tensorflow from eating all GPU memory #473

Stop tensorflow from eating all GPU memory #473

mattdangerw commented Jul 13, 2023

fchollet commented Jul 14, 2023

fchollet Jul 14, 2023

fchollet Jul 14, 2023

mattdangerw Jul 14, 2023

fchollet Jul 14, 2023

Stop tensorflow from eating all GPU memory #473

Stop tensorflow from eating all GPU memory #473

Conversation

mattdangerw commented Jul 13, 2023

fchollet commented Jul 14, 2023

fchollet Jul 14, 2023

Choose a reason for hiding this comment

fchollet Jul 14, 2023

Choose a reason for hiding this comment

mattdangerw Jul 14, 2023

Choose a reason for hiding this comment

fchollet Jul 14, 2023

Choose a reason for hiding this comment