Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fall back to CPU if GPU is not available #1517

Merged
merged 1 commit into from
Oct 14, 2021
Merged

Conversation

vmx
Copy link
Contributor

@vmx vmx commented Oct 6, 2021

The current code expected that if the GPU-based tree building is enabled,
that there also is a working GPU available. When it is not, it panics.
This commit adds a fallback to CPU in case selecting a working GPU fails.

It can be verified by running

FIL_PROOFS_USE_GPU_TREE_BUILDER=1 cargo test --test stacked_vanilla test_stacked_porep_prove_verify

if the machine has no working Nvidia or AMD GPU.

Fixes #1516.

cryptonemo
cryptonemo previously approved these changes Oct 6, 2021
@cryptonemo
Copy link
Collaborator

Would you mind removing those merge commits and cleanly rebasing?

@dignifiedquire
Copy link
Contributor

@vmx can we add a ci config that executes this on CI?

Copy link
Contributor Author

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally didn't add it on CI as I'd consider it to be one more specialized test run that just takes time and doesn't add value once it was confirmed it works manually. Also even if it doesn't work, you'd get a segfault, which isn't nice but easy to file a bug against. So far we also haven't seen this reported yet, it only came up as I don't have a supported CPU on my laptop.

But I can add a test if needed.

@vmx
Copy link
Contributor Author

vmx commented Oct 13, 2021

@cryptonemo IIRC, a squash merge will remove all those merges.

@vmx
Copy link
Contributor Author

vmx commented Oct 13, 2021

Rebased and added a printout of the actual error.

@dignifiedquire Please let me know if you still want a CI job for testing this.

@dignifiedquire
Copy link
Contributor

makes sense, I think we are okay with this for now, and we can add this as a case once we have the larger ci setup going

i + 1,
tree_count
);
info!("building base tree_r_last {}/{}", i + 1, tree_count);
Copy link
Collaborator

@cryptonemo cryptonemo Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the with GPU logging here, as people grep for that in logs to know it's attempted (or which code path is run).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it says "with GPU", I'd consider "working" being that it actually runs on the GPU. That may no longer be the case.

I propose logging "GPU" or "CPU" depending on whether it's running on the CPU or the GPU, so if people grep for it, they immediately see "oh it's running on the CPU, I need to check the logs why" and then see the previous warning.

Copy link
Collaborator

@cryptonemo cryptonemo Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, it's important to tell which method is run (i.e. the codepath attempted). It doesn't matter if it later fails (or rather, that's something else to look for in logs), which already can happen with lock issues, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting we add it back -- I have specifically dug through logs with miners and we use that as a starting point for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep-links don't work. There's a message info!("generating tree r last using the GPU"); which will be logged. So you would know the code path you are on.

Anyway, we disagree. I add it back as probably you are the one need to deal with bug reports (which will never come as this code path will practically be hit anyway :)

The current code expected that if the GPU-based tree building is enabled,
that there also is a working GPU available. When it is not, it panics.
This commit adds a fallback to CPU in case selecting a working GPU fails.

It can be verified by running

    FIL_PROOFS_USE_GPU_TREE_BUILDER=1 cargo test --test stacked_vanilla test_stacked_porep_prove_verify

if the machine has no working Nvidia or AMD GPU.

Fixes #1516.
@cryptonemo cryptonemo merged commit fd769f0 into master Oct 14, 2021
@cryptonemo cryptonemo deleted the add-cpu-fallback branch October 15, 2021 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test segfaults when no GPU can be found
3 participants