-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider consolidating conda solves in CI tests #22
Comments
I don't fully understand how much you need to be rebuilding, but if the answer is "not much" then maybe you could use https://conda.github.io/conda-pack/ or consider building a docker image instead of re-creating the environment each time? |
I am proposing a flag |
This is a possible solution, but we'd need to maintain a separate Docker image for every repository and every test matrix configuration (e.g. a container containing the cuDF Python test environment on ARM using Python 3.9 and CUDA 12.2). I don't think we have the infrastructure to manage that while also ensuring that the conda environments are regularly updated (every night?) for all the possible containers so that we are aware of upstream changes that affect us. The workflows we have today would require an explosively large number of large containers (many of the containers would need to contain an entire CTK installed by conda). This would also still require us to have a conda environment in that container that is compatible with the CI artifacts that we are installing. This doesn't avoid the "can't downgrade Arrow 15" problem because we must install the CI artifacts (conda packages built from the current PR) into the container. |
The docker repo suffers from a similar problem and it was (partly) mitigated by dynamically editing the environment file: https://github.com/rapidsai/docker/blob/branch-24.04/context/notebooks.sh#L31, so just throwing that out there. |
Once rapidsai/cuml#5781 is merged, do we plan to roll out the same changes to the rest of RAPIDS? IIUC that's what it would take to close this issue. |
Seems reasonable We may want ensure that we are always getting cached CI packages too, which would close out issue: #14 |
@vyasr Yes, my plan is to convert all of RAPIDS to this single-solve approach. @jakirkham The approach I'm taking in rapidsai/cuml#5781 will pin the
|
Ok it sounds like that is mostly an issue when we are getting close to (or in the midst of) a release. Is that right? Agree that could be an issue, but that is a much smaller issue than picking up the prior release (especially well after it went out) |
Yeah, I think the problematic case would be something like changing the dependencies of a package in a way that the conda packages from CI artifacts are not compatible with the rest of the testing environment, and the solver chooses recent nightlies instead of the PR artifacts. @charlesbluca proposed we could pin the exact channels like |
This PR creates a conda environment containing both test dependencies and cuml packages. This is a workaround for some issues seen with conda being unable to downgrade from Arrow 15 (in the initial environment with test dependencies) to Arrow 14 (currently pinned by cudf, which is a dependency of cuml). This is a partial solution for rapidsai/build-planning#22. (More work is needed for other RAPIDS repos.) Depends on rapidsai/dependency-file-generator#67. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Charles Blackmon-Luca (https://github.com/charlesbluca) - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) - Jake Awe (https://github.com/AyodeAwe) - https://github.com/jakirkham - Dante Gama Dessavre (https://github.com/dantegd) URL: #5781
An additional minor benefit is that this will speed up CI by reducing the number of solves. |
This might not have a large impact on CI runtime. See rapidsai/cudf#15201 (comment). tl;dr, test startup time is dominated by network activity (downloading multiple GB of packages from RAPIDS and CTK dependencies) and the potential conda solve time savings are comparatively small. |
Taking this on with high priority because cuspatial CI is blocked, and this might help there. |
This is a port of rapidsai/cuml#5781 and part of rapidsai/build-planning#22 Authors: - Mike Sarahan (https://github.com/msarahan) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Paul Taylor (https://github.com/trxcllnt) - https://github.com/jakirkham - Ray Douglass (https://github.com/raydouglass) URL: #1387
@msarahan Is there a plan to continue on this work? We have implemented it for cuml and cuspatial but nothing else. It would be good to have alignment across RAPIDS to prevent future issues. |
Added task list to parent issue. We'll work on this as time allows. |
…apids-dask-dependency` (#513) Explicitly include dependencies on `libcufile*` and `*cudart*` in the Python Conda package. These seem to be pulled in implicitly by the C++ package. However as they are dependencies of the Python built shared objects, they should also be listed here. Also consolidate the Conda environment creation and installation of nightly dependencies into one step. By doing this in one step, we guarantee that all of our constraints are taken into account during environment creation. Lastly switch from using `dask` directly to `rapids-dask-dependency` to align Dask installation and pinning with the rest of RAPIDS. xref: rapidsai/build-planning#22 xef: #378 Authors: - https://github.com/jakirkham - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Bradley Dice (https://github.com/bdice) - Mads R. B. Kristensen (https://github.com/madsbk) URL: #513
Currently (using cuML as an example here), the conda test environment initialization for most CI jobs looks something like creating the test environment:
And then downloading and installing build artifacts from previous jobs on top of this environment:
In addition to forcing us to eat the cost of a second conda environment solve, in many cases this can cause some pretty drastic changes to the environment which can be blocking - for example, consider this cuML run which fails because conda is unable to solve a downgrade from Arrow 15.0.0 (build 5) to 14.0.1.
Our current workaround for this is to manually add pinnings to the testing dependencies initially solved such that the artifact installation can be solved, but this can introduce a lot of burden in needing to:
Would it be possible to consolidate some (or all) of these conda environment solves by instead:
In my mind, the main blocker I could see to this working would be if
rapids-download-conda-from-s3
requires some conda packages contained in the testing environment to work.EDIT: Updating to capture state of other projects:
The text was updated successfully, but these errors were encountered: