-
Hello. First, this is an awesome project. Second, my project partner and I are attempting to replicate your work and to re-train the model. Unfortunately, we have run into hardware limitations when trying to run the escape prediction for SARS-CoV-2 using the command from the README (we run out of memory on our GPU machine and we run out of time on the CPU machine (90 hours)). We see the time estimation for the SARS-CoV-2 escape as being 10 hours. Do you remember how long it took to train the model? We want to re-train it on the delta variant. Do you have any suggestions for overcoming the time/memory issues? We are considering/trying to train on only the RBD, but I am not certain if that is enough information for training and it looks like it might require significant code changes. I appreciate any help you can offer. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @achigbrow, thanks for reaching out! Training the model so that it fits in smaller GPU memory can be done by lowering the minibatch size, though this could influence training dynamics. Training the model on full-length Spike with the current code is also quite slow going even on a GPU -- if I recall correctly, I remember it taking a week or so. Running escape inference is about 10 hours, and can also better fit in memory by lowering "inference batch size." Restricting to the RBD could be a good option if you are extremely resource constrained and should largely require pointing the script to a different FASTA file. Hope that helps! |
Beta Was this translation helpful? Give feedback.
Hi @achigbrow, thanks for reaching out! Training the model so that it fits in smaller GPU memory can be done by lowering the minibatch size, though this could influence training dynamics. Training the model on full-length Spike with the current code is also quite slow going even on a GPU -- if I recall correctly, I remember it taking a week or so.
Running escape inference is about 10 hours, and can also better fit in memory by lowering "inference batch size."
Restricting to the RBD could be a good option if you are extremely resource constrained and should largely require pointing the script to a different FASTA file. Hope that helps!