Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA libs in Python Conda, Consolidate Conda CI installs & use rapids-dask-dependency #513

Merged

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Oct 23, 2024

Explicitly include dependencies on libcufile* and *cudart* in the Python Conda package. These seem to be pulled in implicitly by the C++ package. However as they are dependencies of the Python built shared objects, they should also be listed here.

Also consolidate the Conda environment creation and installation of nightly dependencies into one step. By doing this in one step, we guarantee that all of our constraints are taken into account during environment creation.

Lastly switch from using dask directly to rapids-dask-dependency to align Dask installation and pinning with the rest of RAPIDS.

xref: rapidsai/build-planning#22
xef: #378

@jakirkham jakirkham requested a review from a team as a code owner October 23, 2024 18:39
@jakirkham jakirkham requested a review from AyodeAwe October 23, 2024 18:39
@jakirkham jakirkham added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 23, 2024
@jakirkham jakirkham force-pushed the req_conda_py_libcufile-dev branch from fb76de8 to 3c589fb Compare October 23, 2024 21:53
@jakirkham
Copy link
Member Author

Am seeing errors like this in the C++ tests. However we really haven't changed anything on the C++ side

Performing async I/O using by-value arguments
File async write: 4096
File async read: 4096
(2759 us)
KvikIO defaults: 
  Compatibility mode: enabled
Write: 8192
Parallel POSIX read (1 threads): 8192
Test project /opt/conda/envs/test/bin/tests/libkvikio
CMake Error at CTestTestfile.cmake:5 (execute_process):
  execute_process given unknown argument "COMMAND_ERROR_IS_FATAL".
    Start 1: cpp_tests


1/1 Test #1: cpp_tests ........................***Failed    0.01 sec
CMake Error: No script specified for argument -P


0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.01 sec

The following tests FAILED:
	  1 - cpp_tests (Failed)
Errors while running CTest

@madsbk do you have thoughts on what may be happening here?

@madsbk madsbk mentioned this pull request Nov 26, 2024
@madsbk
Copy link
Member

madsbk commented Nov 26, 2024

Hmm, I cannot reproduce locally. Let me try some CI debugging :)

@madsbk
Copy link
Member

madsbk commented Nov 26, 2024

Trying to debug in CI: #562. The gtests works when running them directly, but fails when running through ctest:

 + ./cpp_tests
[==========] Running 3 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from BasicIOTest
[ RUN      ] BasicIOTest.write_read
[       OK ] BasicIOTest.write_read (340 ms)
[ RUN      ] BasicIOTest.write_read_async
[       OK ] BasicIOTest.write_read_async (1 ms)
[----------] 2 tests from BasicIOTest (341 ms total)

[----------] 1 test from Defaults
[ RUN      ] Defaults.parse_compat_mode_str
[       OK ] Defaults.parse_compat_mode_str (0 ms)
[----------] 1 test from Defaults (0 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 2 test suites ran. (342 ms total)
[  PASSED  ] 3 tests.
+ ctest -VV --no-tests=error --output-on-failure
UpdateCTestConfiguration  from :/opt/conda/envs/test/bin/tests/libkvikio/DartConfiguration.tcl
UpdateCTestConfiguration  from :/opt/conda/envs/test/bin/tests/libkvikio/DartConfiguration.tcl
Test project /opt/conda/envs/test/bin/tests/libkvikio
Constructing a list of tests
CMake Error at CTestTestfile.cmake:5 (execute_process):
  execute_process given unknown argument "COMMAND_ERROR_IS_FATAL".


Checking test dependency graph...
Checking test dependency graph end
test 1
    Start 1: cpp_tests

1: Test command: /opt/conda/envs/test/bin/cmake "-Dcommand_to_run=./../../..//bin/tests/libkvikio/cpp_tests" "-Dcommand_args=" "-P=./run_gpu_test.cmake"
1: Test timeout computed to be: 9.99988e+06
1: CMake Error: No script specified for argument -P
1/1 Test #1: cpp_tests ........................***Failed    0.01 sec
CMake Error: No script specified for argument -P

Errors while running CTest

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.01 sec

The following tests FAILED:
	  1 - cpp_tests (Failed)

Are we running a old cmake?

execute_process given unknown argument "COMMAND_ERROR_IS_FATAL".

@jakirkham
Copy link
Member Author

jakirkham commented Dec 6, 2024

Thanks Mads! 🙏

Yep that's the issue

We have a newer cmake installed. However it gets downgraded during a later install step (please see log)

  - cmake               3.31.1  hf9cb763_0                 conda-forge          Cached
  + cmake                3.5.0  3                          conda-forge            11MB

Think this is because we install the environment and then later install KvikIO packages into the environment. Fixing this likely involves consolidating those 2 install steps on CI

xref: rapidsai/build-planning#22

@jakirkham jakirkham requested a review from a team as a code owner December 6, 2024 03:52
@jakirkham jakirkham changed the title Explicitly list CUDA libs in Python Conda package List CUDA lib in Python Conda package & Consolidate Conda CI installs Dec 6, 2024
Sometimes the progress bar and other output activity causes issues in CI
logs. It isn't necessary to include as we output the environment
produced at the end. So quiet these install steps.
Also drop the YAML anchor as this is only used in one place.
@jakirkham
Copy link
Member Author

jakirkham commented Dec 6, 2024

That seems to have cleared up those errors. Saw some new errors)

FAILED tests/test_zarr.py::test_dask_read[cupy-True] - AttributeError: `np.round_` was removed in the NumPy 2.0 release. Use `np.round` instead.

This is happening as we are getting a really old copy of Dask, which is incompatible with NumPy 2

dask                      2023.3.0           pyhd8ed1ab_0    conda-forge
dask-core                 2023.3.0           pyhd8ed1ab_0    conda-forge
distributed               2023.3.0           pyhd8ed1ab_0    conda-forge

It looks like we were just setting a lower bound on Dask and using that to test

- &dask dask>=2022.05.2

However everywhere else in RAPIDS we use rapids-dask-dependency to handle Dask (given it needs regular bumps in Dask-CUDA and other Dask-cu* libraries)

So have pushed a similar change here to address this issue

@jakirkham jakirkham changed the title List CUDA lib in Python Conda package & Consolidate Conda CI installs Add CUDA libs in Python Conda, Consolidate Conda CI installs & use rapids-dask-dependency Dec 6, 2024
@jakirkham
Copy link
Member Author

Looks like that last change fixed. CI now passes! 🥳

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine to me. Are the -q flags for conda/mamba meant to fix all the blank lines that appear from the download progress bars? We may want a tracking issue so we can apply that consistently. Perhaps something should go in the CI images’ base conda config?

@jakirkham
Copy link
Member Author

Thanks Bradley! 🙏

Yes exactly

Good ideas. Filed upstream issue: rapidsai/build-planning#126

Please add anything else needed over there

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakirkham

@madsbk
Copy link
Member

madsbk commented Dec 6, 2024

Do we want this is branch-24.12?

@bdice
Copy link
Contributor

bdice commented Dec 6, 2024

Do we want this is branch-24.12?

I think yes. But I am okay with it going to 25.02. It definitely improves CI correctness.

@madsbk
Copy link
Member

madsbk commented Dec 6, 2024

/merge

@rapids-bot rapids-bot bot merged commit dc6c0c1 into rapidsai:branch-24.12 Dec 6, 2024
57 checks passed
@jakirkham jakirkham deleted the req_conda_py_libcufile-dev branch December 6, 2024 18:26
@jakirkham
Copy link
Member Author

Thanks Bradley and Mads! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants