Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: if GPU is missing, discard app versions and results that refer to it. #5577

Merged
merged 3 commits into from
Apr 15, 2024

Conversation

davidpanderson
Copy link
Contributor

This addresses an issue introduced when we changed GPU names from (e.g.) 'Apple M3 Pro' to 'apple_gpu'.

Note: this means that if a host has in-progress jobs using a discrete GPU, and you remove that GPU, those jobs will be discarded. This is a change, which I think is good because otherwise you'd get error messages forever.

…r to it.

This addresses an issue introduced when we changed GPU names from
(e.g.) 'Apple M3 Pro' to 'apple_gpu'.

Note: this means that if a host has in-progress jobs using a discrete GPU,
and you remove that GPU, those jobs will be discarded.
This is a change, which I think is good because otherwise
you'd get error messages forever.
@AenBleidd
Copy link
Member

@davidpanderson, I think this is quite a dangerous change. Please see my email for more details.

@RichardHaselgrove
Copy link
Contributor

I agree with Vitalii - this change needs, at the minimum, more thought. One significant problem: with modern versions of Windows, an operating system update and reboot can take place "outside working hours" without the explicit approval of the user. The sequence of component restarts can be ill-defined: it is possible for BOINC to restart before the video drivers have loaded and are ready for use. In that case, the GPUs are logged as being missing - but it's only a transient situation. Shutting down the client and restarting it resolves the problem.

Discarding the task is wasteful of project resources and the user's internet bandwidth.

@AenBleidd
Copy link
Member

I'm making this PR a draft to discuss this change.

@CharlieFenton
Copy link
Contributor

Instead of this why not just allow the old Apple GPU (or whatever it was) as an alias for the new name? That would provide backward compatibility.

@CharlieFenton
Copy link
Contributor

… or accept any GPU name containing Apple

@davidpanderson
Copy link
Contributor Author

It doesn't matter if a few jobs get discarded.

@@ -294,9 +294,13 @@ int CLIENT_STATE::parse_state_file_aux(const char* fname) {
}
if (avp->missing_coproc) {
msg_printf(project, MSG_INFO,
"Application uses missing %s GPU",
"App version uses missing GPU '%s' - discarding",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed to show discarding for all cases, not only when Apple GPU is missing?
It might be a little bit misleading for the users.

@davidpanderson
Copy link
Contributor Author

Oops! I fixed the messages in each case

@AenBleidd AenBleidd marked this pull request as ready for review April 15, 2024 10:07
@AenBleidd AenBleidd merged commit 627637d into master Apr 15, 2024
91 of 92 checks passed
@AenBleidd AenBleidd deleted the dpa_missing_gpu branch April 15, 2024 10:08
AenBleidd added a commit that referenced this pull request May 24, 2024
client: if GPU is missing, discard app versions and results that refer to it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants