[air][tune] Aim logger #30674

ju2ez · 2022-11-26T16:25:24Z

Why are these changes needed?

The default tensorboardx logger, though very powerful has some downsides to it like poor scalability (slow with many experiments). With this PR I want to give the option for an alternative (powerful) logger in form of the aim logger.

Related issue number

Closes #30537

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- proper unit testing of the written aim metrics
- Release tests

richardliaw

Thank you so much for your contribution!

I left a couple comments.

python/ray/tune/logger/aim.py

python/ray/tune/tests/test_logger.py

richardliaw · 2022-12-14T19:09:07Z

@SGevorg can you take a look to see if the aim usage looks right?

SGevorg · 2022-12-14T19:17:54Z

@richardliaw sure, thanks for the follow up.
Also adding @gorarakelyan and @alberttorosyan to this thread to take it away from here.

ju2ez

Minor improvements to the first implementation

python/ray/tune/logger/aim.py

ju2ez

Changes are implemented.

justinvyu

Hi @ju2ez, thank you for the contribution, this is looking good!

I've left some comments, let me know if you need any clarification or some more guidance. I am also happy to help out if needed!

2 other things:

We'll need to add aim as a Tune dependency for the CI. Add it here: https://github.com/ray-project/ray/blob/master/python/requirements/ml/requirements_tune.txt
Another thing that we should add to increase the visibility of the Aim logger: add it to the docs here (I'm thinking right under the How to log to Tensorboard? section). Basically just showing the user how to add the callback and how to view logs locally with aim up at the experiment/repo directory.

python/ray/tune/logger/aim.py

justinvyu · 2022-12-15T21:53:34Z

python/ray/tune/logger/aim.py

+        Returns: Run
+        """
+        run = self._run_cls(
+            repo=self._repo_path,


It seems like aim logs of all trials within a Tune experiment should be written to a shared .aim repo (concurrent jobs writing to the same repo is supported).

This works if the user specifies a _repo_path manually, but we should default to having it written to the experiment directory. Otherwise it would get written to the user's original working directory, which may be confusing.

Ex:

~/ray_results/experiment_name/ .aim/ trial0_logdir/ trial1_logdir/ ...

We should default the logs to be written to the experiment directory, and we can also fill in a default experiment name:

experiment_dir = str(Path(trial.logdir).parent) run = self._run_cls( repo=self._repo_path or experiment_dir, experiment=self._experiment or trial.experiment_dir_name, **self._aim_run_kwargs, )

python/ray/tune/logger/aim.py

python/ray/tune/tests/test_logger.py

alberttorosyan

Approving PR since all comments from my side are addressed.

tamohannes · 2022-12-27T09:08:32Z

Hey @ju2ez, I am Hovhannes from Aim 😊. I will love to help you finalize and push the changes. 🚀
Can you please add me as a collaborator with write permissions to your ju2ez/ray repo?

tamohannes · 2022-12-28T09:07:41Z

@ju2ez also we can add a flag: as_multirun, which will allow the users to track each trial as a single Aim.Run or to keep them all in the same Aim.Run. I already have a code snippet for this functionality, will love to share it with you!

ju2ez · 2022-12-28T11:41:43Z

Hey @ju2ez, I am Hovhannes from Aim 😊. I will love to help you finalize and push the changes. 🚀 Can you please add me as a collaborator with write permissions to your ju2ez/ray repo?

Hey @tmynn,

thanks for your support!
FYI: I'm on vacation until 5th of January and will work on finalizing this PR once I'm back.

I think the as_multirun flag is a great idea!

Best,
Julian

justinvyu · 2023-01-07T00:36:47Z

@ju2ez @tmynn
Just bumping this thread to see if any assistance is needed on my end! Changes look good and I think what's mostly left is to finish writing the tests and to add an example to the docs. Let me know if there are any questions on what needs to be done, looking forward to this addition!

ju2ez · 2023-01-07T15:12:59Z

@ju2ez @tmynn Just bumping this thread to see if any assistance is needed on my end! Changes look good and I think what's mostly left is to finish writing the tests and to add an example to the docs. Let me know if there are any questions on what needs to be done, looking forward to this addition!

@justinvyu thanks for catching up.
Yes, you are right.

Let's try to get it done until the end of next week!

tamohannes · 2023-01-07T17:27:41Z

@ju2ez please let me know if you need my help with any of these tasks.

ju2ez · 2023-01-07T17:39:28Z

@ju2ez please let me know if you need my help with any of these tasks.

Thanks! @tmynn
It would be great if you could maybe create the docs since you probably much more familiar with aim and how to describe it.

Also it would be great if you could help with the tests, since i don't really know your best practices.

tamohannes · 2023-01-08T16:56:37Z

@ju2ez nice, I will start with the docs first. :)

There is no ubuntu20.04 image for cuda 10.2 and 10.1. Signed-off-by: Kornél Csernai <749306+csko@users.noreply.github.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

…ay-project#31493) Signed-off-by: Kai Fricke <kai@anyscale.com> Release tests are currently failing with an error on file upload (botocore.exceptions.DataNotFoundError: Unable to load data for: ec2/2016-11-15/endpoint-rule-set-1). This is likely because some tests are using an anyscale push-based API to upload files. By switching to the job-based filemanager for all tests the upload issue should be mitigated. Please note that execution will still happen with SDK commands for those tests that haven't specified to use jobs for execution, so actual test execution should be unaffected. Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

…31495) Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

…oject#31468) Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

This PR is to enable lazy execution by default. See ray-project/enhancements#19 for motivation. The change includes: * Change `Dataset` constructor: `Dataset.__init__(lazy: bool = True)`. Also remove `defer_execution` field, as it's no longer needed. * `read_api.py:read_datasource()` returns a lazy `Dataset` with computing the first input block. * Add `ds.fully_executed()` calls to required unit tests, to make sure they are passing. TODO: - [x] Fix all unit tests - [x] ray-project#31459 - [x] ray-project#31460 - [ ] Remove the behavior to eagerly compute first block for read - [ ] ray-project#31417 - [ ] Update documentation Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

…r events (ray-project#31489) Signed-off-by: praveeng <praveeng@anyscale.com> # Why are these changes needed? Autoscaler event logs are prefixed with (scheduler) which is misleading. This PR changes the prefix to be (autoscaler) Tested building ray locally and running an application (see attached logs). Added unit tests. # Related issue number Closes ray-project#24807 Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

…oject#31195) This took over ray-project#27578 We add an option to warn user when Deprecated API is used. Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Preliminary PR for adding Python 3.11 support, mapping out various dependencies, fixing issues. ### Main changes - Upgrade cython to 0.29.32 - Add CI/CD steps for 3.11 wheel - Change cython code to not use `recursion_depth` - Update cloudpickle to latest - Use newer manylinux2014 which has python3.11 - Condition certain python packages in requirements.txt on <3.11 that don't yet have a 3.11 version ### Checklist: - cython - [x] remove deprecated `recursion_depth` - [x] exc_type cython/cython#4500 - [ ] package dependencies https://pyreadiness.org/3.11/ - [ ] llvmlite numba/llvmlite#869 - [ ] numba numba/numba#8304 - [ ] pyarrow - [ ] scikit-learn - [ ] pydantic - [x] cloudpickle - [x] upgrade cython to 0.29.32 - [ ] tensorflow tensorflow/tensorflow#58032 - [ ] torch pytorch/pytorch#86566 - [ ] miniconda conda/conda#11170 - [ ] `docker/base-deps/Dockerfile` - [x] claim to support 3.11 in setup.py - [ ] cicd - [ ] .buildkite/ - [ ] .buildkite/pipeline.build.yml - [ ] ci/ - [ ] ci/build/test-wheels.sh - [ ] ci/build/build-docker-images.py - [ ] release tests - [ ] docker/retag-lambda/python_versions.txt - [ ] download_wheels.sh - [ ] wheels - [ ] `python/build-wheel-macos.sh` - [ ] `python/build-wheel-windows.sh` - [ ] Tests - [ ] pytest ray/serve/tests - [ ] python python/ray/serve/examples/echo_full.py - [ ] bazel test //:core_worker_test - [ ] bazel test --build_tests_only //:all - [ ] //python/ray/tests:test_pydantic_serialization fastapi/fastapi#5048 - [ ] //python/ray/train:test_torch_utils - [ ] Documentation - [x] installation.rst Current status: Linux and mac wheels build in CICD. Docker images will come in a separate PR. Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Signed-off-by: Julian <julianhatzky@googlemail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Signed-off-by: Julian <JulianHatzky@googlemail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

richardliaw · 2023-01-30T00:23:44Z

hey, this PR history seems pretty messed up -- can we reopen with a new PR?

ju2ez · 2023-01-30T01:51:16Z

hey, this PR history seems pretty messed up -- can we reopen with a new PR?

@richardliaw @tmynn
Wow - don't know what happened there. Good idea - I opened a new PR to wrap things up there.

Please see #32041

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: tmynn <hovhannes.tamoyan@gmail.com> Co-authored-by: Justin Yu <justinvyu@berkeley.edu> Closes #30537 Fixes #30674

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: tmynn <hovhannes.tamoyan@gmail.com> Co-authored-by: Justin Yu <justinvyu@berkeley.edu> Closes ray-project#30537 Fixes ray-project#30674 Signed-off-by: Jack He <jackhe2345@gmail.com>

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: tmynn <hovhannes.tamoyan@gmail.com> Co-authored-by: Justin Yu <justinvyu@berkeley.edu> Closes ray-project#30537 Fixes ray-project#30674 Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: tmynn <hovhannes.tamoyan@gmail.com> Co-authored-by: Justin Yu <justinvyu@berkeley.edu> Closes ray-project#30537 Fixes ray-project#30674

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: tmynn <hovhannes.tamoyan@gmail.com> Co-authored-by: Justin Yu <justinvyu@berkeley.edu> Closes ray-project#30537 Fixes ray-project#30674 Signed-off-by: elliottower <elliot@elliottower.com>

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: tmynn <hovhannes.tamoyan@gmail.com> Co-authored-by: Justin Yu <justinvyu@berkeley.edu> Closes ray-project#30537 Fixes ray-project#30674 Signed-off-by: Jack He <jackhe2345@gmail.com>

ju2ez mentioned this pull request Nov 26, 2022

Integration with Ray Tune aimhubio/aim#2369

Closed

ju2ez mentioned this pull request Dec 10, 2022

Integration of aim metric tracker as an alternative to tensorboardx in ray tune #30537

Closed

richardliaw requested changes Dec 14, 2022

View reviewed changes

richardliaw changed the title ~~Aim logger~~ [air][tune] Aim logger Dec 14, 2022

richardliaw assigned justinvyu Dec 14, 2022

ju2ez commented Dec 15, 2022

View reviewed changes

alberttorosyan reviewed Dec 15, 2022

View reviewed changes

python/ray/tune/logger/aim.py Outdated Show resolved Hide resolved

python/ray/tune/logger/aim.py Outdated Show resolved Hide resolved

ju2ez commented Dec 15, 2022

View reviewed changes

ju2ez force-pushed the aim_logger branch from 984b504 to c342d4b Compare December 15, 2022 14:00

justinvyu requested changes Dec 15, 2022

View reviewed changes

alberttorosyan approved these changes Dec 21, 2022

View reviewed changes

tamohannes mentioned this pull request Dec 23, 2022

[feat] Add Ray Tune integration aimhubio/aim#2445

Closed

tamohannes force-pushed the aim_logger branch from c1d3e3a to 7430873 Compare December 29, 2022 10:40

ju2ez force-pushed the aim_logger branch from 22c845b to 6238acc Compare January 2, 2023 12:33

tamohannes requested review from krfricke, xwjiang2010, amogkam, matthewdeng, Yard1 and maxpumperla as code owners January 15, 2023 22:43

csko and others added 16 commits January 25, 2023 16:26

[ci] Update build-docker-images.py (ray-project#31385)

0c6c52a

There is no ubuntu20.04 image for cuda 10.2 and 10.1. Signed-off-by: Kornél Csernai <749306+csko@users.noreply.github.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Revert "[RLlib] Enable connectors. (ray-project#30388)" (ray-project#…

f572881

…31495) Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[RLlib] Deflake replay buffer demo by making training shorter (ray-pr…

15c7538

…oject#31468) Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[Datasets] Refactor Ray Data API documentation (ray-project#31204)

361fa0c

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[Core][runtime-context] Warn user when deprecated API is used (ray-pr…

f1b6d00

…oject#31195) This took over ray-project#27578 We add an option to warn user when Deprecated API is used. Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[RLlib] RLTrainer is all you need. (ray-project#31490)

d05014a

moved rl_optimizer logic into rl_trainer Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[fix] Sign-off

9c9d923

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[feat] Add tests

600644f

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

param check is now using pop

cee6c36

Signed-off-by: Julian <julianhatzky@googlemail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

metrics are now checked before they are logged.

45780da

Signed-off-by: Julian <julianhatzky@googlemail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Removed comment

a737959

Signed-off-by: Julian <JulianHatzky@googlemail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[fix] sign off previous commits

487727b

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Merge branch 'ray-project:master' into aim_logger

c83459a

ju2ez mentioned this pull request Jan 30, 2023

[air][tune] Aim logger #32041

Merged

7 tasks

ju2ez closed this Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[air][tune] Aim logger #30674

[air][tune] Aim logger #30674

ju2ez commented Nov 26, 2022 •

edited

Loading

richardliaw left a comment

richardliaw commented Dec 14, 2022

SGevorg commented Dec 14, 2022

ju2ez left a comment

ju2ez left a comment

justinvyu left a comment •

edited

Loading

justinvyu Dec 15, 2022

alberttorosyan left a comment

tamohannes commented Dec 27, 2022

tamohannes commented Dec 28, 2022

ju2ez commented Dec 28, 2022

justinvyu commented Jan 7, 2023

ju2ez commented Jan 7, 2023

tamohannes commented Jan 7, 2023

ju2ez commented Jan 7, 2023 •

edited

Loading

tamohannes commented Jan 8, 2023

richardliaw commented Jan 30, 2023

ju2ez commented Jan 30, 2023 •

edited

Loading

[air][tune] Aim logger #30674

[air][tune] Aim logger #30674

Conversation

ju2ez commented Nov 26, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

richardliaw left a comment

Choose a reason for hiding this comment

richardliaw commented Dec 14, 2022

SGevorg commented Dec 14, 2022

ju2ez left a comment

Choose a reason for hiding this comment

ju2ez left a comment

Choose a reason for hiding this comment

justinvyu left a comment • edited Loading

Choose a reason for hiding this comment

justinvyu Dec 15, 2022

Choose a reason for hiding this comment

alberttorosyan left a comment

Choose a reason for hiding this comment

tamohannes commented Dec 27, 2022

tamohannes commented Dec 28, 2022

ju2ez commented Dec 28, 2022

justinvyu commented Jan 7, 2023

ju2ez commented Jan 7, 2023

tamohannes commented Jan 7, 2023

ju2ez commented Jan 7, 2023 • edited Loading

tamohannes commented Jan 8, 2023

richardliaw commented Jan 30, 2023

ju2ez commented Jan 30, 2023 • edited Loading

ju2ez commented Nov 26, 2022 •

edited

Loading

justinvyu left a comment •

edited

Loading

ju2ez commented Jan 7, 2023 •

edited

Loading

ju2ez commented Jan 30, 2023 •

edited

Loading