Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to test nightly environments are solvable and using recent nightlies. #690

Merged
merged 25 commits into from
Sep 10, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 18, 2023

This PR adds tests that run on every integration PR and on a nightly basis to ensure that RAPIDS conda environments can be solved with recent packages. This will help us diagnose and react to problems arising from conda environment conflicts, which sometimes force conda to select older nightly builds of RAPIDS packages.

A partial solution for https://github.com/rapidsai/ops/issues/2947.

Next steps:

Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally support this!

Think it'll be helpful for catching more cases similar to rapidsai/build-planning#14 and rapidsai/build-planning#69 where projects might otherwise silently be using different packages than we want.

-c rapidsai-nightly \
-c conda-forge \
-c nvidia \
rapids=${RAPIDS_VERSION} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(was going to put this as my review, realized a thread would probably be better)

I totally support adding something like this, based on your description offline:

... to ensure that we are able to solve the full RAPIDS conda environment with recent packages (in other words, ensure there are no recent conflicts causing fallback to older conda packages)

But I don't think it offers 100% protection against the case described in https://github.com/rapidsai/ops/issues/2947.

That issue is not about just packages building against too-old versions of dependencies... it's about packages across RAPIDS building against very different versions of dependencies.

It looks to me that the code in this PR would catch cases like these:

  • "rmm nightlies haven't been published in the last 3 days"
  • "the latest versions of cugraph and pylibraft can't be installed in the same environment"

These aren't captured by the existing nightly tests at https://github.com/rapidsai/workflows/actions/workflows/nightly-pipeline.yaml.

But this wouldn't be guaranteed to catch a case like this:

  • "the latest cuml nightly built against an rmm from 5 days ago, but the latest cudf nightly built against an rmm from yesterday"

Because this test with the rapids package is solving across all of the packages' runtime dependencies, but they could have ended up building against older versions based on conflicts in their individual build environments, right?

And those types of conflicts might not show up here if we use pin_compatible(max_pin="x.x") in run dependencies, e.g. pylibraft==24.10.* is going to have a runtime dependency on rmm=24.10.* regardless of which specific nightly of rmm it pulled in at build time. (pylibraft nightly files)

I think detecting that other case would have to happen at build time (or by post-processing of logs from build time). And I don't know how complex that would be, so can't say with confidence that the complexity would be worth it.

I totally support the approach this PR is pursuing, just wanted to be sure to note this other possible avenue for version mismatches to get through.

Copy link
Contributor Author

@bdice bdice Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct about what this will and will not cover. I think it's worth pursuing because (1) it prevents runtime problems from being hidden and (2) sometimes runtime conflicts will also affect build environments, so this may give us a bit of signal into deeper problems happening at build time, should they arise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great! I totally support moving forward with this.

@bdice bdice changed the base branch from branch-23.12 to branch-24.10 August 29, 2024 17:09
@bdice
Copy link
Contributor Author

bdice commented Aug 29, 2024

Yay! I'm happy with this now. It looks like this:
image

Packages that were built 1 or 2 days ago (but less than 3) are shown in yellow, as a warning that something might be broken in the nightly builds.

@bdice bdice marked this pull request as ready for review August 29, 2024 21:00
@bdice bdice requested a review from a team as a code owner August 29, 2024 21:00
@bdice bdice requested a review from msarahan August 29, 2024 21:00
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Although I'll note... I wasn't able to see the pretty colored output that you shared a screenshot of.

The 13,000+ lines of logs at
https://github.com/rapidsai/integration/actions/runs/10622584645/job/29447212735?pr=690 was too large for my browser to load. The majority of that was what looks like a dump of every conda package in the environment's metadata, in pretty-printed (one-line-per-key) JSON format

image

I was able to open the raw logs (link) and command-F and find the output from this job, but that's not the same

image

If you can think of a way to reduce the amount of output or to get just the color-coded results up into the job summary, I think it'd help.

.github/workflows/pr.yaml Outdated Show resolved Hide resolved
.github/workflows/test.yaml Show resolved Hide resolved
vyasr pushed a commit to rapidsai/workflows that referenced this pull request Sep 3, 2024
Companion PR to enable nightly testing for
rapidsai/integration#690.
@bdice
Copy link
Contributor Author

bdice commented Sep 6, 2024

We can merge this once CI passes. I'm going to check the CI results manually so I'm avoiding a /merge for the moment.

@bdice
Copy link
Contributor Author

bdice commented Sep 7, 2024

This is failing due to a "real" problem now, I think.
image

Possibly related to rapidsai/cuspatial#1453 (comment).

@bdice
Copy link
Contributor Author

bdice commented Sep 10, 2024

/merge

@rapids-bot rapids-bot bot merged commit 1e59a93 into rapidsai:branch-24.10 Sep 10, 2024
21 checks passed
@bdice bdice mentioned this pull request Oct 3, 2024
rapids-bot bot pushed a commit that referenced this pull request Oct 3, 2024
This fixes failures in the new testing workflow from #690.

**Update:** the root cause was that `rapids-conda-retry` is sending `2>&1`. The warning is being sent to stderr as intended. The old contents are partially incorrect. We can still solve this by providing `--quiet`, without needing to change `rapids-conda-retry`.

<details><summary>Old issue contents</summary>

Output like this is shown, even with the `--json` flag to conda:
```
==> WARNING: A newer version of conda exists. <==
    current version: 24.9.0
    latest version: 24.9.1

Please update conda by running

    $ conda update -n base -c conda-forge conda
```

The only way to make the output "proper JSON" is to pass `--quiet` as well.


This seems like unintentional behavior from conda. The docs from `conda create --help` literally say:
```
--json                Report all output as json. Suitable for using conda programmatically.
```

</details>

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Mike Sarahan (https://github.com/msarahan)

URL: #729
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants