-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cquery-ing with cuda targets #209
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, looks good to me.
It somehow don't work with bzlmod: > bazel cquery //if_cuda/... --@rules_cuda//cuda:enable=False --enable_bzlmod
ERROR: no such package '@@[unknown repo 'platforms' requested from @@]//': The repository '@@[unknown repo 'platforms' requested from @@]' could not be resolved: No repository visible as '@platforms' from main repository
ERROR: Analysis of target '//if_cuda:kernel' failed; build aborted: no such package '@@[unknown repo 'platforms' requested from @@]//': The repository '@@[unknown repo 'platforms' requested from @@]' could not be resolved: No repository visible as '@platforms' from main repository
ERROR: Build did NOT complete successfully |
Also, handled the case when cuda is enabled but cuda toolkit cannot be found. The previous behavior was that bazel build would fail because some targets are not present in
but there is no cuda toolkit available we get
This is IMO more readable, with addition of the docs for Thinking a bit further, it makes more sense to use globally |
@cloudhan PTAL. |
@@ -0,0 +1,16 @@ | |||
"""private defs""" | |||
|
|||
def requires_cuda(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken and adapted from https://github.com/tensorflow/runtime, there used to be rules_cuda repo in there.
|
||
cuda_library( | ||
name = "kernel", | ||
srcs = ["kernel.cu"], | ||
hdrs = ["kernel.h"], | ||
target_compatible_with = requires_cuda(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a side note: users could still get non-so-obvious errors from bazel in case the target compatibility is not set.
I'd prefer not to add it. Because it somehow to me, it mixed the target and host, the enable generally means the In the case of the toolchain being mis-configured or not configured, it is a @jsharpe Any comment? |
I looked into how
The second constraint is a bit misleading because we didn't even look for it. Open to suggestions.
For the reference: ⬆️ is for the case cuda is enabled but cuda toolkit is not found. @cloudhan PTAL. |
@cloudhan friendly reminder: PTAL. I implemented changes you requested. |
Sorry for the delay, was dealing with the CI update for Bazel 7 support. |
Let me play around with it to see if we can simplify it a little bit. |
To be honest, what is dragging me down in these changes is the latter maneuver that tries to hide error info
It doesn't change the situation (that toolchain is not found), but add another layer of indirection. IIRC the initial change fixed cquery-ing error and looks good... |
I don't know what to do about your last comment. Can you propose an alternative? As-is, folks can't use cquery when there is no cuda toolchain present. This PR fixes that situation and eventually gives a more user-friendly failure message IMO. |
Are you not happy about |
The Better split it into a seperate PR? I might ask some bazel toolchain experts on what is the best practice for it... |
cuda/BUILD.bazel
Outdated
|
||
constraint_setting(name = "cuda_must_be_enabled_setting") | ||
|
||
constraint_value( | ||
name = "cuda_must_be_enabled", | ||
constraint_setting = ":cuda_must_be_enabled_setting", | ||
) | ||
|
||
constraint_setting(name = "cuda_must_be_found_setting") | ||
|
||
constraint_value( | ||
name = "cuda_must_be_found", | ||
constraint_setting = ":cuda_must_be_found_setting", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you split this PR, please also address the naming
In the cuda namespace (//cuda:), you don't repeat cuda, then you can see //cuda:must_be_enabled
and //cuda:must_be_found
are extremely confusing. I'd suggest renaming these as rules_are_enabled...
and valid_toolchain_is_configured
or something the like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed naming.
cuda/private/defs.bzl
Outdated
* CUDA is enabled and | ||
* CUDA toolchain is found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* rules are enabled and
* toolchain is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cuda/private/defs.bzl
Outdated
the conditions are not satisfied. Incompatible targets are excluded | ||
from bazel target wildcards and fail to build if requested explicitly. | ||
""" | ||
return _requires_is_enabled() + _requires_cuda_found() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
So, with this PR, the unresolved toolchain error is gone, because a dummy one is actually registered. If we remove
if cuda toolchain is not found. Moreover, cquery still doesn't work:
So, for cquery to work, one must use target_compatible_with (IMO, this is good practice). I made an update to this PR such that both cquery work + that we get IMO more dev-friendly errors than WDYT? Let's first sort this out and then I can address your remarks. |
OK, I see your point now. The situation is caused by the ":disabled-local-toolchain" being relying on the The Lets adopt the current change to fix the issue. Please address naming tho. |
@cloudhan PTAL, I addressed your comments. |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [rules_cuda](https://github.com/bazel-contrib/rules_cuda) | http_archive | patch | `v0.2.1` -> `v0.2.2` | --- ### Release Notes <details> <summary>bazel-contrib/rules_cuda (rules_cuda)</summary> ### [`v0.2.2`](https://github.com/bazel-contrib/rules_cuda/releases/tag/v0.2.2) [Compare Source](https://github.com/bazel-contrib/rules_cuda/compare/v0.2.1...v0.2.2) #### `WORKSPACE` code ```starlark load("@​bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") http_archive( name = "rules_cuda", sha256 = "b066750579f33e93e9dc55b8ee2067b525d863c1ddcf09b47a6332c39f0701fb", strip_prefix = "rules_cuda-v0.2.2", urls = ["https://github.com/bazel-contrib/rules_cuda/releases/download/v0.2.2/rules_cuda-v0.2.2.tar.gz"], ) load("@​rules_cuda//cuda:repositories.bzl", "register_detected_cuda_toolchains", "rules_cuda_dependencies") rules_cuda_dependencies() register_detected_cuda_toolchains() ``` #### What's Changed - Filter attrs properly for cuda_test by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/180](https://github.com/bazel-contrib/rules_cuda/pull/180) - Fix cuda_test by [@​hofbi](https://github.com/hofbi) in [https://github.com/bazel-contrib/rules_cuda/pull/181](https://github.com/bazel-contrib/rules_cuda/pull/181) - Add v0.2.1 to docs by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/184](https://github.com/bazel-contrib/rules_cuda/pull/184) - Pass through `--sysroot` to host compiler by [@​lalten](https://github.com/lalten) in [https://github.com/bazel-contrib/rules_cuda/pull/185](https://github.com/bazel-contrib/rules_cuda/pull/185) - Add cuda_binary macro by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/186](https://github.com/bazel-contrib/rules_cuda/pull/186) - Move non glob files out of glob by [@​hofbi](https://github.com/hofbi) in [https://github.com/bazel-contrib/rules_cuda/pull/192](https://github.com/bazel-contrib/rules_cuda/pull/192) - Disallow autoupdate nccl by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/193](https://github.com/bazel-contrib/rules_cuda/pull/193) - eliminate cpu architecture constraint for clang by [@​dmellosanjay](https://github.com/dmellosanjay) in [https://github.com/bazel-contrib/rules_cuda/pull/208](https://github.com/bazel-contrib/rules_cuda/pull/208) - Check nvcc version before adding `--dopt on` flags by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/212](https://github.com/bazel-contrib/rules_cuda/pull/212) - Add alwayslink to cuda_binary and cuda_test macros by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/210](https://github.com/bazel-contrib/rules_cuda/pull/210) - Add additional tests for LTS releases by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/215](https://github.com/bazel-contrib/rules_cuda/pull/215) - Fix cquery-ing with cuda targets by [@​mvukov](https://github.com/mvukov) in [https://github.com/bazel-contrib/rules_cuda/pull/209](https://github.com/bazel-contrib/rules_cuda/pull/209) - Propose new solution for know issue (nvcc filesystem race condition) by [@​hofbi](https://github.com/hofbi) in [https://github.com/bazel-contrib/rules_cuda/pull/216](https://github.com/bazel-contrib/rules_cuda/pull/216) - Fix a typo in `if_cuda` doc by [@​rygx](https://github.com/rygx) in [https://github.com/bazel-contrib/rules_cuda/pull/222](https://github.com/bazel-contrib/rules_cuda/pull/222) - Ignore MODULE.bazel.lock file by [@​rygx](https://github.com/rygx) in [https://github.com/bazel-contrib/rules_cuda/pull/224](https://github.com/bazel-contrib/rules_cuda/pull/224) - ci: avoid nvcc /tmp race condition by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/232](https://github.com/bazel-contrib/rules_cuda/pull/232) - ci: disable doc test workflow cache to avoid excessive space wasting by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/231](https://github.com/bazel-contrib/rules_cuda/pull/231) - feat: Add features for compiling with -arch=all or -arch=all-major by [@​jsharpe](https://github.com/jsharpe) in [https://github.com/bazel-contrib/rules_cuda/pull/245](https://github.com/bazel-contrib/rules_cuda/pull/245) - Fix spelling by [@​Vertexwahn](https://github.com/Vertexwahn) in [https://github.com/bazel-contrib/rules_cuda/pull/250](https://github.com/bazel-contrib/rules_cuda/pull/250) - Change example rules_cuda version by [@​Vertexwahn](https://github.com/Vertexwahn) in [https://github.com/bazel-contrib/rules_cuda/pull/249](https://github.com/bazel-contrib/rules_cuda/pull/249) - Document how to use rules_cuda with Bzlmod by [@​Vertexwahn](https://github.com/Vertexwahn) in [https://github.com/bazel-contrib/rules_cuda/pull/252](https://github.com/bazel-contrib/rules_cuda/pull/252) - Do not assume libcupti.so location by [@​tyb0807](https://github.com/tyb0807) in [https://github.com/bazel-contrib/rules_cuda/pull/253](https://github.com/bazel-contrib/rules_cuda/pull/253) - ci: use absolute path for XDG_CACHE_HOME as github actions and bazel doesn't resolve `~` automatically by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/260](https://github.com/bazel-contrib/rules_cuda/pull/260) - ci: cover major bazel releases in utilities tests by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/262](https://github.com/bazel-contrib/rules_cuda/pull/262) - test: workaround label resolving with bzlmod by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/263](https://github.com/bazel-contrib/rules_cuda/pull/263) - fix(bzlmod): allow both root module and our module to call cuda.local_toolchain by [@​cloudhan](https://github.com/cloudhan) in [https://github.com/bazel-contrib/rules_cuda/pull/264](https://github.com/bazel-contrib/rules_cuda/pull/264) #### New Contributors - [@​dmellosanjay](https://github.com/dmellosanjay) made their first contribution in [https://github.com/bazel-contrib/rules_cuda/pull/208](https://github.com/bazel-contrib/rules_cuda/pull/208) - [@​mvukov](https://github.com/mvukov) made their first contribution in [https://github.com/bazel-contrib/rules_cuda/pull/209](https://github.com/bazel-contrib/rules_cuda/pull/209) - [@​rygx](https://github.com/rygx) made their first contribution in [https://github.com/bazel-contrib/rules_cuda/pull/222](https://github.com/bazel-contrib/rules_cuda/pull/222) - [@​Vertexwahn](https://github.com/Vertexwahn) made their first contribution in [https://github.com/bazel-contrib/rules_cuda/pull/250](https://github.com/bazel-contrib/rules_cuda/pull/250) - [@​tyb0807](https://github.com/tyb0807) made their first contribution in [https://github.com/bazel-contrib/rules_cuda/pull/253](https://github.com/bazel-contrib/rules_cuda/pull/253) **Full Changelog**: bazel-contrib/rules_cuda@v0.2.1...v0.2.2 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View the [repository job log](https://developer.mend.io/github/secretflow/spu). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4yMC4xIiwidXBkYXRlZEluVmVyIjoiMzguMjYuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This makes possible to do the following:
cd examples bazel cquery //if_cuda/... --@rules_cuda//cuda:enable=False
when one wants to run cquery (e.g. within bazel-diff) in an env without cuda (e.g. a CI).
I read the readme of if_cuda example. It's advertised that it's OK that bazel errors out if one want to build a cuda target when rules_cuda is disabled. I'd say that's a drastic measure. Folks should IMO mark their bazel targets like I did in this PR.