-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fall back to CPU if GPU is not available #1517
Conversation
Would you mind removing those merge commits and cleanly rebasing? |
@vmx can we add a ci config that executes this on CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally didn't add it on CI as I'd consider it to be one more specialized test run that just takes time and doesn't add value once it was confirmed it works manually. Also even if it doesn't work, you'd get a segfault, which isn't nice but easy to file a bug against. So far we also haven't seen this reported yet, it only came up as I don't have a supported CPU on my laptop.
But I can add a test if needed.
@cryptonemo IIRC, a squash merge will remove all those merges. |
Rebased and added a printout of the actual error. @dignifiedquire Please let me know if you still want a CI job for testing this. |
makes sense, I think we are okay with this for now, and we can add this as a case once we have the larger ci setup going |
i + 1, | ||
tree_count | ||
); | ||
info!("building base tree_r_last {}/{}", i + 1, tree_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the with GPU
logging here, as people grep for that in logs to know it's attempted (or which code path is run).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it says "with GPU", I'd consider "working" being that it actually runs on the GPU. That may no longer be the case.
I propose logging "GPU" or "CPU" depending on whether it's running on the CPU or the GPU, so if people grep for it, they immediately see "oh it's running on the CPU, I need to check the logs why" and then see the previous warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, it's important to tell which method is run (i.e. the codepath attempted). It doesn't matter if it later fails (or rather, that's something else to look for in logs), which already can happen with lock issues, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message before https://github.com/filecoin-project/rust-fil-proofs/pull/1517/files/afff4ee118b45ddb399f0da19e1c9734ce3a3feb#diff-2b9ef9ad01858f1c45194ca4dd5926d25f4baf9df408e76fc680a8cd2e49ff1bL1442 will indicate which code path we are on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm requesting we add it back -- I have specifically dug through logs with miners and we use that as a starting point for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep-links don't work. There's a message info!("generating tree r last using the GPU");
which will be logged. So you would know the code path you are on.
Anyway, we disagree. I add it back as probably you are the one need to deal with bug reports (which will never come as this code path will practically be hit anyway :)
The current code expected that if the GPU-based tree building is enabled, that there also is a working GPU available. When it is not, it panics. This commit adds a fallback to CPU in case selecting a working GPU fails. It can be verified by running FIL_PROOFS_USE_GPU_TREE_BUILDER=1 cargo test --test stacked_vanilla test_stacked_porep_prove_verify if the machine has no working Nvidia or AMD GPU. Fixes #1516.
The current code expected that if the GPU-based tree building is enabled,
that there also is a working GPU available. When it is not, it panics.
This commit adds a fallback to CPU in case selecting a working GPU fails.
It can be verified by running
if the machine has no working Nvidia or AMD GPU.
Fixes #1516.