-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to use Modnet with gpu acceleration ? #226
Comments
Could you post a minimum failing MODNet script too? I've never had issues trying it out on my GPU with TF 2.11 (I doubt much of MODNet works with a more recent TF, though we don't really have the resources to spend to update it). |
Hi @ml-evs , will post in today in sometime. Need to wait for gpus to be free again. Can you also share your environment file, I can test with that too if that helps. |
I haven't tried for years as I never saw any worthwhile speed up for the size of networks I was creating, so don't have one to hand unfortunately. But I can try to reproduce your issue locally at least. |
Hi @ml-evs , I realized the example script is almost same as in one of the example notebooks, just with additional features appended to dataframe . Just at the top I have this additional lines to limit to 1 GPU import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" https://github.com/ppdebreuck/modnet/blob/master/example_notebooks/training_multi_data.ipynb On further checking a bit, I find when I try to use GA especially this error occurs. In the same notebook, it seems to run fine untill following line. # fitting
model.fit(train_data) But When I try to run with Gentic algo based hp opt, i get I also tried with Also, just want to mention when I ran this tests , I have ran this fit commands independently after restarting kernel. So gpu was free in all the cases. It worked only when not using preset or genetic algo |
I don't have time to look into this fully now, but my guess is that the first preset is running and the rest see that the device is busy, as we are using Python multiprocessing and typically TF will allocate the entire GPU ot the first process. You could try e.g. |
Hey all ! Sorry for the late reply. I would not spent time on this @naik-aakash: (i) MODNet uses small networks (intended for small datasets), with no benefit of GPU training (except if you force on having big architectures, but this would be a very specific use case). (ii) We are not fully happy with TF, and would like to migrate to Torch or JAX. Matthew and I don't have time for this, but we might have a student doing this next semester (finger crossed) ;p |
Just to add I'd still love to find time to update MODNet to the latest Keras core, which has torch/Jax and TF backends (and hopefully should be quite an easy translation job!) -- see #158 for some details |
Great idea, and probably easier than translating to PyTorch given our frequent usage of tf.keras |
Hi @ml-evs , @ppdebreuck , I have been trying to use Modnet with GPU. But cannot get it to work. I had to use a different tensorflow version than pinned on modnet. I am using
tensorflow==2.15.0
. (As using 2.11.0, gpus are not detected at all on my system)System has cuda 12.4 installed.
It always fails with the following error
"CUDA-capable device(s) is/are busy or unavailable" or failed to set cuda device.
I tested with this to see if the TensorFlow is installed correctly. It seems to be the case tensorflow is working fine.
Not able to figure out what could be the problem here. Any help in this regard would be great!
The text was updated successfully, but these errors were encountered: