Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air][tune] Aim logger #32041

Merged
merged 77 commits into from
Mar 3, 2023
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
fd93d57
added a first working draft of the aim_logger alongside unit test
ju2ez Nov 26, 2022
7276a39
adjusted the logger/__init__.py. Now it imports the AimCallback
ju2ez Nov 26, 2022
61d1b6e
[feat] Add docs
tamohannes Jan 15, 2023
f71c41e
[fix] Sign-off
tamohannes Jan 15, 2023
1b0ba2a
param check is now using pop
ju2ez Jan 21, 2023
af613e5
metrics are now checked before they are logged.
ju2ez Jan 21, 2023
20b7f7d
[feat] Add tests
tamohannes Jan 21, 2023
63b646b
Removed comment
ju2ez Jan 22, 2023
99ffd41
[feat] Add as_multirun option. Update the docstring
tamohannes Dec 29, 2022
2533be5
[feat] Add docs
tamohannes Jan 15, 2023
2d80acd
Fixed the imports and added try except statement for third party libr…
ju2ez Jan 2, 2023
59e3538
Update python/ray/tune/logger/aim.py
ju2ez Jan 2, 2023
feb9f8a
Added assertion for aim import.
ju2ez Jan 2, 2023
b4de23f
Type hint for _create_run(..)
ju2ez Jan 2, 2023
62210e3
Renamed log_hparams function.
ju2ez Jan 2, 2023
080cf1f
Fixed function call.
ju2ez Jan 2, 2023
1c701c6
Adjusted docstring and added the possibility to give kwargs to aim.sd…
ju2ez Jan 2, 2023
d860cb8
[feat] Add docs
tamohannes Jan 15, 2023
7bacdb8
Small refinements
ju2ez Jan 30, 2023
a208e56
Update python/ray/tune/logger/aim.py
ju2ez Feb 3, 2023
de457eb
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
5b6a5f5
Update doc/source/tune/api_docs/logging.rst
ju2ez Feb 3, 2023
10ba378
Update doc/source/tune/api_docs/logging.rst
ju2ez Feb 3, 2023
3310e8a
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
7a36941
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
8baf62d
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
a6a2b0e
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
688c8bd
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
81ca4e0
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
1d91634
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
3164826
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 3, 2023
61e1c72
Update python/ray/tune/logger/aim.py
ju2ez Feb 3, 2023
d710082
Update python/ray/tune/logger/aim.py
ju2ez Feb 3, 2023
394370b
Update python/ray/tune/logger/aim.py
ju2ez Feb 3, 2023
78fa601
Update python/ray/tune/logger/aim.py
ju2ez Feb 3, 2023
0c2a80e
Merge branch 'ray-project:master' into aim
ju2ez Feb 3, 2023
b355e03
Minor adjustments to the docs
ju2ez Feb 3, 2023
0312a67
Update python/ray/tune/logger/aim.py
ju2ez Feb 6, 2023
a89ca3e
Update python/ray/tune/logger/aim.py
ju2ez Feb 6, 2023
7fe127e
Update python/ray/tune/logger/aim.py
ju2ez Feb 6, 2023
b855ff0
Update python/ray/tune/logger/aim.py
ju2ez Feb 6, 2023
143404f
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 6, 2023
6b08871
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 6, 2023
87ff75f
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 6, 2023
9632173
Update python/ray/tune/logger/aim.py
ju2ez Feb 6, 2023
5924080
Update python/ray/tune/logger/aim.py
ju2ez Feb 6, 2023
5374e29
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 6, 2023
7b749e2
Update doc/source/tune/examples/tune-aim.ipynb
ju2ez Feb 6, 2023
4006c93
Merge branch 'ray-project:master' into aim
ju2ez Feb 6, 2023
f390ef1
Fix code in aim.py
justinvyu Feb 6, 2023
d79a72d
Rename AimCallback -> AimLoggerCallback
justinvyu Feb 6, 2023
2cdca85
Various cleanups
justinvyu Feb 6, 2023
3b820d8
Support tuples, sets
justinvyu Feb 6, 2023
3fedfeb
Fix certain hparam types getting thrown away incorrectly
justinvyu Feb 7, 2023
b976e44
Improve testing to cover different configuration options
justinvyu Feb 7, 2023
9751b11
Fix notebook example
justinvyu Feb 7, 2023
210701a
Update python/ray/tune/logger/aim.py
ju2ez Feb 7, 2023
c1c7997
Update python/ray/tune/logger/aim.py
ju2ez Feb 7, 2023
a4d41e9
Remove as_multirun flag
justinvyu Feb 7, 2023
7f97cfa
Merge branch 'aim' of https://github.com/ju2ez/ray into aimlogger
justinvyu Feb 7, 2023
19ce007
Fix merge
justinvyu Feb 7, 2023
6b7295a
Add aim as a Tune dependency
justinvyu Feb 7, 2023
ae163bf
Check that aim callback can be imported from ray.tune.logger
justinvyu Feb 7, 2023
7175239
Merge branch 'master' of https://github.com/ray-project/ray into aiml…
justinvyu Feb 7, 2023
0c7fafe
Fix formatting for kwargs in docstring
justinvyu Feb 7, 2023
ebaf3bf
Merge branch 'master' of https://github.com/ray-project/ray into aiml…
justinvyu Feb 7, 2023
f59a78c
Merge branch 'master' of https://github.com/ray-project/ray into aiml…
justinvyu Feb 27, 2023
7a39431
Pin aim to 3.16.1 w/ versioning patch fix for CI to pass
justinvyu Feb 27, 2023
04d0b1e
Fix aim logger tests
justinvyu Mar 2, 2023
b17eb00
Merge branch 'master' of https://github.com/ray-project/ray into aiml…
justinvyu Mar 2, 2023
01a6541
Update docs to follow new format
justinvyu Mar 2, 2023
c2ecbd0
Fix text in user guide
justinvyu Mar 2, 2023
6c94f37
Don't import aim callback by default
justinvyu Mar 2, 2023
48b6341
Merge branch 'master' of https://github.com/ray-project/ray into aiml…
justinvyu Mar 2, 2023
29d1caa
Fix module path in API ref
justinvyu Mar 2, 2023
6b02036
Fix tests + example imports
justinvyu Mar 2, 2023
80be6f9
Merge branch 'master' of https://github.com/ray-project/ray into aiml…
justinvyu Mar 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,8 @@ parts:
title: "Huggingface Example"
- file: tune/examples/experiment-tracking
sections:
- file: tune/examples/tune-aim
title: "Aim Example"
- file: tune/examples/tune-comet
title: "Comet Example"
- file: tune/examples/tune-wandb
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/aim_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/aim_logo_full.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions doc/source/tune/api_docs/logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ see :ref:`Trainable Logging <trainable-logging>`.
to use our new interface with the ``LoggerCallback`` class instead.


Aim
---------

.. autoclass:: ray.tune.logger.AimLoggerCallback

Install Aim via ``pip install aim``.
See the :doc:`tutorial here </tune/examples/tune-aim>`

Viskit
------

Expand Down
8 changes: 8 additions & 0 deletions doc/source/tune/examples/experiment-tracking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ to use Ray Tune with Tensorboard, you can find more information in our
:column: col-md-4 px-2 py-2
:img-top-cls: pt-5 w-75 d-block mx-auto

---
:img-top: /images/aim_logo.png

+++
.. link-button:: tune-aim-ref
:type: ref
:text: Using Aim with Ray Tune For Experiment Management
:classes: btn-link btn-block stretched-link
---
:img-top: /images/comet_logo_full.png

Expand Down
407 changes: 407 additions & 0 deletions doc/source/tune/examples/tune-aim.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions python/ray/tune/logger/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from ray.tune.logger.json import JsonLogger, JsonLoggerCallback
from ray.tune.logger.noop import NoopLogger
from ray.tune.logger.tensorboardx import TBXLogger, TBXLoggerCallback
from ray.tune.logger.aim import AimLoggerCallback
justinvyu marked this conversation as resolved.
Show resolved Hide resolved

DEFAULT_LOGGERS = (JsonLogger, CSVLogger, TBXLogger)

Expand All @@ -26,4 +27,5 @@
"TBXLogger",
"TBXLoggerCallback",
"UnifiedLogger",
"AimLoggerCallback",
]
192 changes: 192 additions & 0 deletions python/ray/tune/logger/aim.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
import logging

import numpy as np
from typing import TYPE_CHECKING, Dict, Optional, List, Union

from ray.tune.logger.logger import LoggerCallback
from ray.tune.result import (
TRAINING_ITERATION,
TIME_TOTAL_S,
TIMESTEPS_TOTAL,
)
from ray.tune.utils import flatten_dict
from ray.util.annotations import PublicAPI

if TYPE_CHECKING:
from ray.tune.experiment.trial import Trial

try:
from aim.sdk import Repo, Run
except ImportError:
Repo, Run = None, None

logger = logging.getLogger(__name__)

VALID_SUMMARY_TYPES = [int, float, np.float32, np.float64, np.int32, np.int64]


@PublicAPI
class AimLoggerCallback(LoggerCallback):
"""Aim Logger: logs metrics in Aim format.

Aim is an open-source, self-hosted ML experiment tracking tool.
It's good at tracking lots (thousands) of training runs, and it allows you to
compare them with a performant and well-designed UI.

Source: https://github.com/aimhubio/aim

Args:
repo: Aim repository directory or a `Repo` object that the Run object will
log results to. If not provided, a default repo will be set up in the
experiment directory (one level above trial directories).
experiment: Sets the `experiment` property of each Run object, which is the
experiment name associated with it. Can be used later to query
runs/sequences.
If not provided, the default will be the Tune experiment name set
by `RunConfig(name=...)`.
metrics: List of metric names (out of the metrics reported by Tune) to
track in Aim. If no metric are specified, log everything that
is reported.
**aim_run_kwargs: Additional arguments that will be passed when creating the
justinvyu marked this conversation as resolved.
Show resolved Hide resolved
individual `Run` objects for each trial. For the full list of arguments,
please see the Aim documentation:
https://aimstack.readthedocs.io/en/latest/refs/sdk.html
"""

VALID_HPARAMS = (str, bool, int, float, list, type(None))
VALID_NP_HPARAMS = (np.bool8, np.float32, np.float64, np.int32, np.int64)

def __init__(
self,
repo: Optional[Union[str, "Repo"]] = None,
experiment_name: Optional[str] = None,
metrics: Optional[List[str]] = None,
**aim_run_kwargs,
):
"""
See help(AimLoggerCallback) for more information about parameters.
"""
assert Run is not None, (
"aim must be installed!. You can install aim with"
" the command: `pip install aim`."
)
self._repo_path = repo
self._experiment_name = experiment_name
if not (bool(metrics) or metrics is None):
raise ValueError(
"`metrics` must either contain at least one metric name, or be None, "
"in which case all reported metrics will be logged to the aim repo."
)
self._metrics = metrics
self._aim_run_kwargs = aim_run_kwargs
self._trial_to_run: Dict["Trial", Run] = {}

def _create_run(self, trial: "Trial") -> Run:
"""Initializes an Aim Run object for a given trial.

Args:
trial: The Tune trial that aim will track as a Run.

Returns:
Run: The created aim run for a specific trial.
"""
experiment_dir = trial.local_dir
run = Run(
repo=self._repo_path or experiment_dir,
experiment=self._experiment_name or trial.experiment_dir_name,
**self._aim_run_kwargs,
)
# Attach a few useful trial properties
run["trial_id"] = trial.trial_id
run["trial_log_dir"] = trial.logdir
if trial.remote_checkpoint_dir:
run["trial_remote_log_dir"] = trial.remote_checkpoint_dir
trial_ip = trial.get_runner_ip()
if trial_ip:
run["trial_ip"] = trial_ip
return run

def log_trial_start(self, trial: "Trial"):
if trial in self._trial_to_run:
# Cleanup an existing run if the trial has been restarted
self._trial_to_run[trial].close()

trial.init_logdir()
self._trial_to_run[trial] = self._create_run(trial)

if trial.evaluated_params:
self._log_trial_hparams(trial)

def log_trial_result(self, iteration: int, trial: "Trial", result: Dict):
tmp_result = result.copy()

step = result.get(TIMESTEPS_TOTAL, None) or result[TRAINING_ITERATION]

for k in ["config", "pid", "timestamp", TIME_TOTAL_S, TRAINING_ITERATION]:
tmp_result.pop(k, None) # not useful to log these

# `context` and `epoch` are special keys that users can report,
# which are treated as special aim metrics/configurations.
context = tmp_result.pop("context", None)
ju2ez marked this conversation as resolved.
Show resolved Hide resolved
epoch = tmp_result.pop("epoch", None)

trial_run = self._trial_to_run[trial]
path = ["ray", "tune"]

flat_result = flatten_dict(tmp_result, delimiter="/")
valid_result = {}

for attr, value in flat_result.items():
if self._metrics and attr not in self._metrics:
continue

full_attr = "/".join(path + [attr])
if isinstance(value, tuple(VALID_SUMMARY_TYPES)) and not (
np.isnan(value) or np.isinf(value)
):
valid_result[attr] = value
trial_run.track(
value=value,
name=full_attr,
epoch=epoch,
step=step,
context=context,
)
elif (isinstance(value, (list, tuple, set)) and len(value) > 0) or (
isinstance(value, np.ndarray) and value.size > 0
):
valid_result[attr] = value

def log_trial_end(self, trial: "Trial", failed: bool = False):
trial_run = self._trial_to_run.pop(trial)
trial_run.close()

def _log_trial_hparams(self, trial: "Trial"):
params = flatten_dict(trial.evaluated_params, delimiter="/")
flat_params = flatten_dict(params)

scrubbed_params = {
k: v for k, v in flat_params.items() if isinstance(v, self.VALID_HPARAMS)
}

np_params = {
k: v.tolist()
for k, v in flat_params.items()
if isinstance(v, self.VALID_NP_HPARAMS)
}

scrubbed_params.update(np_params)
removed = {
k: v
for k, v in flat_params.items()
if not isinstance(v, self.VALID_HPARAMS + self.VALID_NP_HPARAMS)
}
if removed:
logger.info(
"Removed the following hyperparameter values when "
"logging to aim: %s",
str(removed),
)

run = self._trial_to_run[trial]
run["hparams"] = scrubbed_params
Loading