-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to test nightly environments are solvable and using recent nightlies. #690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally support this!
Think it'll be helpful for catching more cases similar to rapidsai/build-planning#14 and rapidsai/build-planning#69 where projects might otherwise silently be using different packages than we want.
-c rapidsai-nightly \ | ||
-c conda-forge \ | ||
-c nvidia \ | ||
rapids=${RAPIDS_VERSION} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(was going to put this as my review, realized a thread would probably be better)
I totally support adding something like this, based on your description offline:
... to ensure that we are able to solve the full RAPIDS conda environment with recent packages (in other words, ensure there are no recent conflicts causing fallback to older conda packages)
But I don't think it offers 100% protection against the case described in https://github.com/rapidsai/ops/issues/2947.
That issue is not about just packages building against too-old versions of dependencies... it's about packages across RAPIDS building against very different versions of dependencies.
It looks to me that the code in this PR would catch cases like these:
- "
rmm
nightlies haven't been published in the last 3 days" - "the latest versions of
cugraph
andpylibraft
can't be installed in the same environment"
These aren't captured by the existing nightly tests at https://github.com/rapidsai/workflows/actions/workflows/nightly-pipeline.yaml.
But this wouldn't be guaranteed to catch a case like this:
- "the latest
cuml
nightly built against anrmm
from 5 days ago, but the latestcudf
nightly built against anrmm
from yesterday"
Because this test with the rapids
package is solving across all of the packages' runtime dependencies, but they could have ended up building against older versions based on conflicts in their individual build environments, right?
And those types of conflicts might not show up here if we use pin_compatible(max_pin="x.x")
in run
dependencies, e.g. pylibraft==24.10.*
is going to have a runtime dependency on rmm=24.10.*
regardless of which specific nightly of rmm
it pulled in at build time. (pylibraft nightly files)
I think detecting that other case would have to happen at build time (or by post-processing of logs from build time). And I don't know how complex that would be, so can't say with confidence that the complexity would be worth it.
I totally support the approach this PR is pursuing, just wanted to be sure to note this other possible avenue for version mismatches to get through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct about what this will and will not cover. I think it's worth pursuing because (1) it prevents runtime problems from being hidden and (2) sometimes runtime conflicts will also affect build environments, so this may give us a bit of signal into deeper problems happening at build time, should they arise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great! I totally support moving forward with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
Although I'll note... I wasn't able to see the pretty colored output that you shared a screenshot of.
The 13,000+ lines of logs at
https://github.com/rapidsai/integration/actions/runs/10622584645/job/29447212735?pr=690 was too large for my browser to load. The majority of that was what looks like a dump of every conda package in the environment's metadata, in pretty-printed (one-line-per-key) JSON format
I was able to open the raw logs (link) and command-F and find the output from this job, but that's not the same
If you can think of a way to reduce the amount of output or to get just the color-coded results up into the job summary, I think it'd help.
Companion PR to enable nightly testing for rapidsai/integration#690.
We can merge this once CI passes. I'm going to check the CI results manually so I'm avoiding a |
This is failing due to a "real" problem now, I think. Possibly related to rapidsai/cuspatial#1453 (comment). |
/merge |
This fixes failures in the new testing workflow from #690. **Update:** the root cause was that `rapids-conda-retry` is sending `2>&1`. The warning is being sent to stderr as intended. The old contents are partially incorrect. We can still solve this by providing `--quiet`, without needing to change `rapids-conda-retry`. <details><summary>Old issue contents</summary> Output like this is shown, even with the `--json` flag to conda: ``` ==> WARNING: A newer version of conda exists. <== current version: 24.9.0 latest version: 24.9.1 Please update conda by running $ conda update -n base -c conda-forge conda ``` The only way to make the output "proper JSON" is to pass `--quiet` as well. This seems like unintentional behavior from conda. The docs from `conda create --help` literally say: ``` --json Report all output as json. Suitable for using conda programmatically. ``` </details> Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) - Mike Sarahan (https://github.com/msarahan) URL: #729
This PR adds tests that run on every
integration
PR and on a nightly basis to ensure that RAPIDS conda environments can be solved with recent packages. This will help us diagnose and react to problems arising from conda environment conflicts, which sometimes force conda to select older nightly builds of RAPIDS packages.A partial solution for https://github.com/rapidsai/ops/issues/2947.
Next steps: