Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use Modnet with gpu acceleration ? #226

Open
naik-aakash opened this issue Oct 10, 2024 · 8 comments
Open

Is it possible to use Modnet with gpu acceleration ? #226

naik-aakash opened this issue Oct 10, 2024 · 8 comments

Comments

@naik-aakash
Copy link
Contributor

Hi @ml-evs , @ppdebreuck , I have been trying to use Modnet with GPU. But cannot get it to work. I had to use a different tensorflow version than pinned on modnet. I am using tensorflow==2.15.0. (As using 2.11.0, gpus are not detected at all on my system)

System has cuda 12.4 installed.

It always fails with the following error

"CUDA-capable device(s) is/are busy or unavailable" or failed to set cuda device.

I tested with this to see if the TensorFlow is installed correctly. It seems to be the case tensorflow is working fine.

import os
import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.debugging.set_log_device_placement(True)
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

Not able to figure out what could be the problem here. Any help in this regard would be great!

@ml-evs
Copy link
Collaborator

ml-evs commented Oct 10, 2024

Could you post a minimum failing MODNet script too? I've never had issues trying it out on my GPU with TF 2.11 (I doubt much of MODNet works with a more recent TF, though we don't really have the resources to spend to update it).

@naik-aakash
Copy link
Contributor Author

Hi @ml-evs , will post in today in sometime. Need to wait for gpus to be free again.

Can you also share your environment file, I can test with that too if that helps.

@ml-evs
Copy link
Collaborator

ml-evs commented Oct 10, 2024

I haven't tried for years as I never saw any worthwhile speed up for the size of networks I was creating, so don't have one to hand unfortunately. But I can try to reproduce your issue locally at least.

@naik-aakash
Copy link
Contributor Author

Hi @ml-evs , I realized the example script is almost same as in one of the example notebooks, just with additional features appended to dataframe .

Just at the top I have this additional lines to limit to 1 GPU

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

https://github.com/ppdebreuck/modnet/blob/master/example_notebooks/training_multi_data.ipynb

On further checking a bit, I find when I try to use GA especially this error occurs. In the same notebook, it seems to run fine untill following line.

# fitting
model.fit(train_data)

But When I try to run with Gentic algo based hp opt, i get cudaSetDevice() on GPU:0 failed. Status: CUDA-capable device(s) is/are busy or unavailable error. So seems somehow gpu gets registered once model is initialized and not get released for next iterations.

I also tried with model.fit_preset(train_data) same error.

Also, just want to mention when I ran this tests , I have ran this fit commands independently after restarting kernel. So gpu was free in all the cases. It worked only when not using preset or genetic algo

@ml-evs
Copy link
Collaborator

ml-evs commented Oct 10, 2024

Also, just want to mention when I ran this tests , I have ran this fit commands independently after restarting kernel. So gpu was free in all the cases. It worked only when not using preset or genetic algo

I don't have time to look into this fully now, but my guess is that the first preset is running and the rest see that the device is busy, as we are using Python multiprocessing and typically TF will allocate the entire GPU ot the first process. You could try e.g. tf.debugging.set_log_device_placement(True). If this is the case, you might be able to fiddle around with set_memory_growth to allow each process to use a small amount of memory initially. I'm not an expert on this, but in the past the gains we saw were very small (/non-existent) from using a GPU for small MODNet models (which is to say I'd be interested in your results if you can get this working!)

@ppdebreuck
Copy link
Owner

Hey all ! Sorry for the late reply. I would not spent time on this @naik-aakash: (i) MODNet uses small networks (intended for small datasets), with no benefit of GPU training (except if you force on having big architectures, but this would be a very specific use case). (ii) We are not fully happy with TF, and would like to migrate to Torch or JAX. Matthew and I don't have time for this, but we might have a student doing this next semester (finger crossed) ;p

@ml-evs
Copy link
Collaborator

ml-evs commented Oct 21, 2024

Just to add I'd still love to find time to update MODNet to the latest Keras core, which has torch/Jax and TF backends (and hopefully should be quite an easy translation job!) -- see #158 for some details

@ppdebreuck
Copy link
Owner

Great idea, and probably easier than translating to PyTorch given our frequent usage of tf.keras

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants