merge with main

huggingface · Aug 2, 2022 · f216820 · f216820
1 parent eca63b2
commit f216820
Show file tree

Hide file tree

Showing 5 changed files with 86 additions and 12 deletions.
diff --git a/docs/source/usage_guides/tracking.mdx b/docs/source/usage_guides/tracking.mdx
@@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
 # Tracking
 
 There are a large number of experiment tracking API's available, however getting them all to work with in a multi-processing environment can oftentimes be complex.
-🤗 Accelerate provides a general tracking API that can be used to log useful items during your script through [`~Accelerator.log`]
+🤗 Accelerate provides a general tracking API that can be used to log useful items during your script through [`Accelerator.log`]
 
 ## Integrated Trackers
 
@@ -33,19 +33,19 @@ accelerator = Accelerator(log_with="wandb")
 accelerator = Accelerator(log_with=["wandb", LoggerType.TENSORBOARD])
 ```
 
-At the start of your experiment [`~Accelerator.init_trackers`] should be used to setup your project, and potentially add any experiment hyperparameters to be logged:
+At the start of your experiment [`Accelerator.init_trackers`] should be used to setup your project, and potentially add any experiment hyperparameters to be logged:
 ```python
 hps = {"num_iterations": 5, "learning_rate": 1e-2}
 accelerator.init_trackers("my_project", config=hps)
 ```
 
-When you are ready to log any data, [`~Accelerator.log`] should be used.
+When you are ready to log any data, [`Accelerator.log`] should be used.
 A `step` can also be passed in to correlate the data with a particular step in the training loop.
 ```python
 accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=1)
 ```
 
-Once you've finished training, make sure to run [`~Accelerator.end_training`] so that all the trackers can run their finish functionalities if they have any.
+Once you've finished training, make sure to run [`Accelerator.end_training`] so that all the trackers can run their finish functionalities if they have any.
 ```python
 accelerator.end_training()
 ```
@@ -85,8 +85,8 @@ accelerator.end_training()
 
 ## Implementing Custom Trackers
 
-To implement a new tracker to be used in `Accelerator`, a new one can be made through implementing the [`~GeneralTracker`] class.
-Every tracker must implement three functions:
+To implement a new tracker to be used in `Accelerator`, a new one can be made through implementing the [`GeneralTracker`] class.
+Every tracker must implement three functions and have three properties:
   - `__init__`: 
     - Should store a `run_name` and initialize the tracker API of the integrated library. 
     - If a tracker stores their data locally (such as TensorBoard), a `logging_dir` parameter can be added.
@@ -95,6 +95,15 @@ Every tracker must implement three functions:
   - `log`: 
     - Should take in a `values` dictionary and a `step`, and should log them to the run
 
+  - `name` (`str`):
+    - A unique string name for the tracker, such as `"wandb"` for the wandb tracker. 
+    - This will be used for interacting with this tracker specifically
+  - `requires_logging_directory` (`bool`):
+    - Whether a `logging_dir` is needed for this particular tracker and if it uses one.
+  - `tracker`: 
+    - This should be implemented as a `@property` function 
+    - Should return the internal tracking mechanism the library uses, such as the `run` object for `wandb`.
+
 A brief example can be seen below with an integration with Weights and Biases, containing only the relevent information:
 ```python
 from accelerate.tracking import GeneralTracker
@@ -109,7 +118,11 @@ class MyCustomTracker(GeneralTracker):
 
     def __init__(self, run_name: str):
         self.run_name = run_name
-        wandb.init(self.run_name)
+        run = wandb.init(self.run_name)
+
+    @property
+    def tracker(self):
+        return self.run.run
 
     def store_init_configuration(self, values: dict):
         wandb.config(values)
@@ -118,7 +131,7 @@ class MyCustomTracker(GeneralTracker):
         wandb.log(values, step=step)
 ```
 
-When you are ready to build your `Accelerator` object, pass in an **instance** of your tracker to [`~Accelerator.log_with`] to have it automatically
+When you are ready to build your `Accelerator` object, pass in an **instance** of your tracker to [`Accelerator.log_with`] to have it automatically
 be used with the API:
 
 ```python
@@ -133,6 +146,30 @@ tracker = MyCustomTracker("some_run_name")
 accelerator = Accelerator(log_with=[tracker, "all"])
 ```
 
+## Accessing the internal tracker 
+
+If some custom interactions with a tracker might be wanted directly, you can quickly access one using the 
+[`Accelerator.get_tracker`] method. Just pass in the string corresponding to a tracker's `.name` attribute 
+and it will return that tracker on the main process.
+
+This example shows doing so with wandb:
+
+```python
+wandb_tracker = accelerator.get_tracker("wandb")
+```
+
+From there you can interact with `wandb`'s `run` object like normal:
+
+<Tip warning={true}>
+  Make sure to only interact with trackers on the main process!
+</Tip>
+
+
+```python
+if accelerator.is_main_process:
+    wandb_run.log_artifact(some_artifact_to_log)
+```
+
 ## When a wrapper cannot work
 
 If a library has an API that does not follow a strict `.log` with an overall dictionary such as Neptune.AI, logging can be done manually under an `if accelerator.is_main_process` statement:

diff --git a/setup.py b/setup.py
@@ -19,7 +19,7 @@
 extras["quality"] = ["black ~= 22.0", "isort >= 5.5.4", "flake8 >= 3.8.3", "hf-doc-builder >= 0.3.0"]
 extras["docs"] = []
 extras["test_prod"] = ["pytest", "pytest-xdist", "pytest-subtests", "parameterized"]
-extras["test_dev"] = ["datasets", "evaluate", "transformers", "scipy", "sklearn", "deepspeed", "tqdm"]
+extras["test_dev"] = ["datasets", "evaluate", "transformers", "scipy", "sklearn", "deepspeed<0.7.0", "tqdm"]
 extras["testing"] = extras["test_prod"] + extras["test_dev"]
 
 extras["test_trackers"] = ["wandb", "comet-ml", "tensorboard"]

diff --git a/src/accelerate/accelerator.py b/src/accelerate/accelerator.py
@@ -412,7 +412,7 @@ def wrapper(self, *args, **kwargs):
 
     def on_local_process(local_process_idx):
         """
-        Run func on certain local process only
+        A decorator that will run the decorated function on a given local process index only.
         """
 
         def decorator(func):
@@ -1066,10 +1066,24 @@ def init_trackers(self, project_name: str, config: Optional[dict] = None, init_k
             for tracker in self.trackers:
                 tracker.store_init_configuration(config)
 
+    @on_main_process
+    def get_tracker(self, name: str):
+        """
+        Returns a `tracker` from `self.trackers` based on `name` on the main process only.
+
+        Args:
+            name (`str`):
+                The name of a tracker, corresponding to the `.name` property.
+        """
+        for tracker in self.trackers:
+            if tracker.name == name:
+                return tracker.tracker
+        raise ValueError(f"{name} is not an available tracker stored inside the `Accelerator`.")
+
     @on_main_process
     def log(self, values: dict, step: Optional[int] = None, log_kwargs: Optional[dict] = {}):
         """
-        Logs `values` to all stored trackers in `self.trackers`.
+        Logs `values` to all stored trackers in `self.trackers` on the main process only.
 
         Args:
             values (`dict`):
@@ -1089,7 +1103,7 @@ def log(self, values: dict, step: Optional[int] = None, log_kwargs: Optional[dic
     @on_main_process
     def end_training(self):
         """
-        Runs any special end training behaviors, such as stopping trackers
+        Runs any special end training behaviors, such as stopping trackers on the main process only.
         """
         for tracker in self.trackers:
             tracker.finish()

diff --git a/src/accelerate/tracking.py b/src/accelerate/tracking.py
@@ -103,6 +103,13 @@ def finish(self):
         """
         pass
 
+    @abstractproperty
+    def tracker(self):
+        """
+        Should return internal tracking mechanism used by a tracker class (such as the `run` for wandb)
+        """
+        pass
+
 
 class TensorBoardTracker(GeneralTracker):
     """
@@ -129,6 +136,10 @@ def __init__(self, run_name: str, logging_dir: Optional[Union[str, os.PathLike]]
             "Make sure to log any initial configurations with `self.store_init_configuration` before training!"
         )
 
+    @property
+    def tracker(self):
+        return self.writer
+
     def store_init_configuration(self, values: dict):
         """
         Logs `values` as hyperparameters for the run. Should be run at the beginning of your experiment.
@@ -196,6 +207,10 @@ def __init__(self, run_name: str, **kwargs):
             "Make sure to log any initial configurations with `self.store_init_configuration` before training!"
         )
 
+    @property
+    def tracker(self):
+        return self.run.run
+
     def store_init_configuration(self, values: dict):
         """
         Logs `values` as hyperparameters for the run. Should be run at the beginning of your experiment.
@@ -256,6 +271,10 @@ def __init__(self, run_name: str, **kwargs):
             "Make sure to log any initial configurations with `self.store_init_configuration` before training!"
         )
 
+    @property
+    def tracker(self):
+        return self.writer
+
     def store_init_configuration(self, values: dict):
         """
         Logs `values` as hyperparameters for the run. Should be run at the beginning of your experiment.

diff --git a/tests/test_tracking.py b/tests/test_tracking.py
@@ -224,6 +224,10 @@ def __init__(self, dir: str):
         self.writer = csv.DictWriter(self.f, fieldnames=self._col_names)
         self.writer.writeheader()
 
+    @property
+    def tracker(self):
+        return self.writer
+
     def store_init_configuration(self, values: dict):
         logger.info("Call init")
         self.writer.writerow(values)