Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling engine to run single epochs #1371

Open
alxlampe opened this issue Oct 6, 2020 · 7 comments
Open

Enabling engine to run single epochs #1371

alxlampe opened this issue Oct 6, 2020 · 7 comments
Labels

Comments

@alxlampe
Copy link
Contributor

alxlampe commented Oct 6, 2020

🚀 Feature

Problem

I am using multiple engines in a nested way. That means, that if e.g. the main engine fires Events.EPOCH_COMPLETED, another child engine is attached to this event and shall run only one epoch. A solution would be to run the child engine with engine.run(max_epochs=1) but then, the engine fires setup and teardown events like Events.STARTED and Events.COMPLETED each time I call engine.run(max_epochs=1) even though those events are for the purpose to only be fired one time, as far as I understand.
Since my child engine must setup and teardown things, I could attach event handlers to the main engine, but the handlers I want to attach do not know that a main engine exists. The handlers shouldn't have any access to the main engine.

Solution

I need some functionality that the engine can do the following (This is just an example with a bad but possible way of implementing this):

engine.run_epoch(max_epochs=3)  # runs setup and first epoch, fires events from `STARTED` to `EPOCH_COMPLETED` 
engine.run_epoch(max_epochs=3) # runs second epoch, fires events from `EPOCH_STARTED`to `EPOCH_COMPLETED` 
engine.run_epoche(max_epochs=3) # runs last epoch and teardown, fires events from `EPOCH_STARTED` to `COMPLETED`

Instead of calling a function, one could create an iterable object from engine.run and get the same behavior in a nicer way:

epoch_iterator = iterable_engine.run(max_epochs=3)
next(epoch_iterator)  # runs setup and first episode, fires events from `STARTED` to `EPOCH_COMPLETED` 
next(epoch_iterator)  # runs second episode, fires events from `EPOCH_STARTED`to `EPOCH_COMPLETED` 
next(epoch_iterator)  # runs last episode and teardown, fires events from `EPOCH_STARTED` to `COMPLETED`

Or one can use loops:

iterable_engine = IterableEngine(lambda x, y: 0.)
iterable_engine.add_event_handler(Events.STARTED, lambda x: print("started"))
iterable_engine.add_event_handler(Events.EPOCH_STARTED, lambda x: print("epoch started"))
iterable_engine.add_event_handler(Events.EPOCH_COMPLETED, lambda x: print("epoch completed"))
iterable_engine.add_event_handler(Events.COMPLETED, lambda x: print("completed"))

epoch_iterator = iterable_engine.run([1], max_epochs=3)
for state in epoch_iterator:
    print("This is outside engine.run")

The output is:

started
epoch started
epoch completed
This is outside engine.run
epoch started
epoch completed
This is outside engine.run
epoch started
epoch completed
This is outside engine.run
completed

I added the code at the bottom where I subclass from Engine and overload the _internal_run method with a copy of the original method and added one line, where I add the yield statement. You can execute it and it outputs the example.
To switch between the actual and this behavior, one could put yield into an if statement and pass an additional argument to engine.run, e.g. engine.run(max_epochs=3, return_generator=True) or set a flag of the engine to enable this functionality.

What do you think?

Code:

import time

from ignite._utils import _to_hours_mins_secs
from ignite.engine import Engine
from ignite.engine import Events
from ignite.engine import State


class IterableEngine(Engine):
    def _internal_run(self) -> State:
        self.should_terminate = self.should_terminate_single_epoch = False
        self._init_timers(self.state)
        try:
            start_time = time.time()
            self._fire_event(Events.STARTED)
            while self.state.epoch < self.state.max_epochs and not self.should_terminate:
                self.state.epoch += 1
                self._fire_event(Events.EPOCH_STARTED)

                if self._dataloader_iter is None:
                    self._setup_engine()

                time_taken = self._run_once_on_dataset()
                # time is available for handlers but must be update after fire
                self.state.times[Events.EPOCH_COMPLETED.name] = time_taken
                handlers_start_time = time.time()
                if self.should_terminate:
                    self._fire_event(Events.TERMINATE)
                else:
                    self._fire_event(Events.EPOCH_COMPLETED)
                time_taken += time.time() - handlers_start_time
                # update time wrt handlers
                self.state.times[Events.EPOCH_COMPLETED.name] = time_taken
                hours, mins, secs = _to_hours_mins_secs(time_taken)
                self.logger.info(
                    "Epoch[%s] Complete. Time taken: %02d:%02d:%02d" % (self.state.epoch, hours, mins, secs)
                )
                if self.should_terminate:
                    break
                yield self.state

            time_taken = time.time() - start_time
            # time is available for handlers but must be update after fire
            self.state.times[Events.COMPLETED.name] = time_taken
            handlers_start_time = time.time()
            self._fire_event(Events.COMPLETED)
            time_taken += time.time() - handlers_start_time
            # update time wrt handlers
            self.state.times[Events.COMPLETED.name] = time_taken
            hours, mins, secs = _to_hours_mins_secs(time_taken)
            self.logger.info("Engine run complete. Time taken: %02d:%02d:%02d" % (hours, mins, secs))

        except BaseException as e:
            self._dataloader_iter = None
            self.logger.error("Engine run is terminating due to exception: %s.", str(e))
            self._handle_exception(e)

        self._dataloader_iter = None
        return self.state


if __name__ == '__main__':
    iterable_engine = IterableEngine(lambda x, y: 0.)
    iterable_engine.add_event_handler(Events.STARTED, lambda x: print("started"))
    iterable_engine.add_event_handler(Events.EPOCH_STARTED, lambda x: print("epoch started"))
    iterable_engine.add_event_handler(Events.EPOCH_COMPLETED, lambda x: print("epoch completed"))
    iterable_engine.add_event_handler(Events.COMPLETED, lambda x: print("completed"))

    epoch_iterator = iterable_engine.run([1], max_epochs=3)
    for state in epoch_iterator:
        print("This is outside engine.run")
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 6, 2020

@alxlampe interesting idea, thanks !

If I understand correctly, what you would like to achieve, it can be done with some events filtering:

from ignite.engine import Engine, Events

engine = Engine(lambda e, b: None)

def once_at_start(engine, _):
    return engine.state.epoch == 0

def once_at_end(engine, _):
    return engine.state.epoch == 10

engine.add_event_handler(Events.STARTED(once_at_start), lambda x: print("started"))
engine.add_event_handler(Events.EPOCH_STARTED, lambda x: print("{} epoch started".format(x.state.epoch)))
engine.add_event_handler(Events.EPOCH_COMPLETED, lambda x: print("{} epoch completed".format(x.state.epoch)))
engine.add_event_handler(Events.COMPLETED(once_at_end), lambda x: print("completed"))

engine.run([0, 1, 2], max_epochs=3)
print("Do something else")
engine.run([0, 1, 2], max_epochs=6)
print("Do something else")
engine.run([0, 1, 2], max_epochs=10)

gives

started
1 epoch started
1 epoch completed
2 epoch started
2 epoch completed
3 epoch started
3 epoch completed
Do something else
4 epoch started
4 epoch completed
5 epoch started
5 epoch completed
6 epoch started
6 epoch completed
Do something else
7 epoch started
7 epoch completed
8 epoch started
8 epoch completed
9 epoch started
9 epoch completed
10 epoch started
10 epoch completed
completed

Can it be generalized to your use-case or it is too urgly and specific. What do you think ?

Since my child engine must setup and teardown things, I could attach event handlers to the main engine, but the handlers I want to attach do not know that a main engine exists. The handlers shouldn't have any access to the main engine.

Maybe, this requirement is not satisfied with above code.

@alxlampe
Copy link
Contributor Author

alxlampe commented Oct 7, 2020

@vfdev-5 Thanks for your reply.
That's true. You can implement my example from above with event filter. One thing is, that eventually you do not know, what the last epoch is in your once_at_end function, but one can replace this with

def once_at_end(engine, _):
    return engine.state.epoch == engine.state.max_epochs

What I think is, that the event handlers that (should) occur exactly one time during a run (like STARTED and COMPLETED) now appear multiple times since I am running the engine multiple times. For me, this breaks the intention of the one time occuring events.

I have a more detailed use case, where this won't work (I think). In this example. I have nested engines:

  • parent_engine that has a child engine child_engine that runs on EPOCH_COMPLETED of parent_engine
  • child_engine that has a child engine child_child_engine that runs on EPOCH_COMPLETED of child_engine

So i have this nesting of engines:

parent_engine
    child_engine
        child_child_engine

While max_epochs will be defined for parent_engine (which is the main engine), both child engines should run an infinite number of epochs, but only one epoch per call.

Then, if parent_engine reaches max_epochs, all engines should run their teardown methods. Or in case of an exception, each engine should fire EXCEPTION_RAISED.
At the moment, this is not possible without attaching the teardown methods of the child engines to parent_engine, because COMPLETED is executed each time I run an epoch.

Below, you find the complete example of the use case and how I solved it. I added event handlers to all three engines to print messages and the indent of the message corresponds the nesting depth.
It's much code but I don't know how to share it in another way than adding it to my comments. You can copy the code into a file and run it.

The result of the code example below is the following output:

EVENT: STARTED
	EVENT: STARTED
		 EVENT: STARTED
		 EVENT: EPOCH_COMPLETED
	EVENT: EPOCH_COMPLETED
EVENT: EPOCH_COMPLETED
		 EVENT: EPOCH_COMPLETED
	EVENT: EPOCH_COMPLETED
EVENT: EPOCH_COMPLETED
		 EVENT: COMPLETED
	EVENT: COMPLETED
EVENT: COMPLETED

Some notes:

  • IterableEngine replaces run and _internal_run to allow iterator generation with engine.run(return_generator=True).
  • ChildEngine adds methods to attach itself to a parent engine and to setup event handlers to run one epoch if parent_engine fires EPOCH_COMPLETED.
import time
from typing import Callable
from typing import Iterable
from typing import Optional

from ignite._utils import _to_hours_mins_secs
from ignite.engine import Engine
from ignite.engine import Events
from ignite.engine import State


class IterableEngine(Engine):
    def _internal_run(self, return_generator) -> State:
        self.should_terminate = self.should_terminate_single_epoch = False
        self._init_timers(self.state)
        try:
            start_time = time.time()
            self._fire_event(Events.STARTED)
            while self.state.epoch < self.state.max_epochs and not self.should_terminate:
                self.state.epoch += 1
                self._fire_event(Events.EPOCH_STARTED)

                if self._dataloader_iter is None:
                    self._setup_engine()

                time_taken = self._run_once_on_dataset()
                # time is available for handlers but must be update after fire
                self.state.times[Events.EPOCH_COMPLETED.name] = time_taken
                handlers_start_time = time.time()
                if self.should_terminate:
                    self._fire_event(Events.TERMINATE)
                else:
                    self._fire_event(Events.EPOCH_COMPLETED)
                time_taken += time.time() - handlers_start_time
                # update time wrt handlers
                self.state.times[Events.EPOCH_COMPLETED.name] = time_taken
                hours, mins, secs = _to_hours_mins_secs(time_taken)
                self.logger.info(
                    "Epoch[%s] Complete. Time taken: %02d:%02d:%02d" % (self.state.epoch, hours, mins, secs)
                )
                if self.should_terminate:
                    break
                if return_generator:
                    yield self.state

            time_taken = time.time() - start_time
            # time is available for handlers but must be update after fire
            self.state.times[Events.COMPLETED.name] = time_taken
            handlers_start_time = time.time()
            self._fire_event(Events.COMPLETED)
            time_taken += time.time() - handlers_start_time
            # update time wrt handlers
            self.state.times[Events.COMPLETED.name] = time_taken
            hours, mins, secs = _to_hours_mins_secs(time_taken)
            self.logger.info("Engine run complete. Time taken: %02d:%02d:%02d" % (hours, mins, secs))

        except BaseException as e:
            self._dataloader_iter = None
            self.logger.error("Engine run is terminating due to exception: %s.", str(e))
            self._handle_exception(e)

        self._dataloader_iter = None
        return self.state

    def run(
        self,
        data: Iterable,
        max_epochs: Optional[int] = None,
        epoch_length: Optional[int] = None,
        seed: Optional[int] = None,
        return_generator: Optional[bool] = False
    ) -> State:
        """Runs the `process_function` over the passed data.

        Engine has a state and the following logic is applied in this function:

        - At the first call, new state is defined by `max_epochs`, `epoch_length`, `seed` if provided. A timer for
            total and per-epoch time is initialized when Events.STARTED is handled.
        - If state is already defined such that there are iterations to run until `max_epochs` and no input arguments
            provided, state is kept and used in the function.
        - If state is defined and engine is "done" (no iterations to run until `max_epochs`), a new state is defined.
        - If state is defined, engine is NOT "done", then input arguments if provided override defined state.

        Args:
            data (Iterable): Collection of batches allowing repeated iteration (e.g., list or `DataLoader`).
            max_epochs (int, optional): Max epochs to run for (default: None).
                If a new state should be created (first run or run again from ended engine), it's default value is 1.
                If run is resuming from a state, provided `max_epochs` will be taken into account and should be larger
                than `engine.state.max_epochs`.
            epoch_length (int, optional): Number of iterations to count as one epoch. By default, it can be set as
                `len(data)`. If `data` is an iterator and `epoch_length` is not set, then it will be automatically
                determined as the iteration on which data iterator raises `StopIteration`.
                This argument should not change if run is resuming from a state.
            seed (int, optional): Deprecated argument. Please, use `torch.manual_seed` or
                :meth:`~ignite.utils.manual_seed`.

        Returns:
            State: output state.

        Note:
            User can dynamically preprocess input batch at :attr:`~ignite.engine.events.Events.ITERATION_STARTED` and
            store output batch in `engine.state.batch`. Latter is passed as usually to `process_function` as argument:

            .. code-block:: python

                trainer = ...

                @trainer.on(Events.ITERATION_STARTED)
                def switch_batch(engine):
                    engine.state.batch = preprocess_batch(engine.state.batch)

        """
        if seed is not None:
            warnings.warn(
                "Argument seed is deprecated. It will be removed in 0.5.0. "
                "Please, use torch.manual_seed or ignite.utils.manual_seed"
            )

        if not isinstance(data, Iterable):
            raise TypeError("Argument data should be iterable")

        if self.state.max_epochs is not None:
            # Check and apply overridden parameters
            if max_epochs is not None:
                if max_epochs < self.state.epoch:
                    raise ValueError(
                        "Argument max_epochs should be larger than the start epoch "
                        "defined in the state: {} vs {}".format(max_epochs, self.state.epoch)
                    )
                self.state.max_epochs = max_epochs
            if epoch_length is not None:
                if epoch_length != self.state.epoch_length:
                    raise ValueError(
                        "Argument epoch_length should be same as in the state, given {} vs {}".format(
                            epoch_length, self.state.epoch_length
                        )
                    )

        if self.state.max_epochs is None or self._is_done(self.state):
            # Create new state
            if max_epochs is None:
                max_epochs = 1
            if epoch_length is None:
                epoch_length = self._get_data_length(data)
                if epoch_length is not None and epoch_length < 1:
                    raise ValueError("Input data has zero size. Please provide non-empty data")

            self.state.iteration = 0
            self.state.epoch = 0
            self.state.max_epochs = max_epochs
            self.state.epoch_length = epoch_length
            self.logger.info("Engine run starting with max_epochs={}.".format(max_epochs))
        else:
            self.logger.info(
                "Engine run resuming from iteration {}, epoch {} until {} epochs".format(
                    self.state.iteration, self.state.epoch, self.state.max_epochs
                )
            )

        self.state.dataloader = data
        return self._internal_run(return_generator=return_generator)


class ChildEngine(IterableEngine):
    """Engine that is attached to a parent engine, that runs infinite number epochs and until the parent engine
    terminates or completed"""

    def __init__(self, process_function: Callable):
        super().__init__(process_function)
        self.epoch_iterator = None

    def _iterate_epoch(self, parent_engine):
        next(self.epoch_iterator)

    def _update_max_epochs(self):
        """Makes engine never reach max_epochs"""
        self.state.max_epochs = self.state.epoch + 1

    def _setup_epoch_iterator(self, data):
        self.add_event_handler(Events.EPOCH_STARTED, self._update_max_epochs)  # ensures child engine never completes
        self.epoch_iterator = self.run(data, max_epochs=1, return_generator=True)

    def attach_to_parent_engine(self, parent_engine, data):
        # setup iterator object on parent engines event started
        parent_engine.add_event_handler(Events.STARTED, self._setup_epoch_iterator, data)
        # runs one epoch of child engine on epoch completed
        parent_engine.add_event_handler(Events.EPOCH_COMPLETED, self._iterate_epoch)

        # stream all events of termination of the parent engine to the child engine

        parent_engine.add_event_handler(Events.COMPLETED, lambda engine: self.fire_event(Events.COMPLETED))
        parent_engine.add_event_handler(Events.TERMINATE, lambda engine: self.fire_event(Events.TERMINATE))
        parent_engine.add_event_handler(Events.EXCEPTION_RAISED,
                                        lambda engine, e: self._fire_event(Events.EXCEPTION_RAISED, e))


if __name__ == '__main__':
    parent_engine = Engine(lambda e, b: 0.)
    child_engine = ChildEngine(lambda e, b: 0.)
    child_child_engine = ChildEngine(lambda e, b: 0.)

    dummy_data = [1, 2, 3]

    child_engine.attach_to_parent_engine(parent_engine, dummy_data)
    child_child_engine.attach_to_parent_engine(child_engine, dummy_data)

    # add event handlers for demonstration
    parent_engine.add_event_handler(Events.STARTED, lambda x: print("EVENT: STARTED"))
    parent_engine.add_event_handler(Events.EPOCH_COMPLETED, lambda x: print("EVENT: EPOCH_COMPLETED"))
    parent_engine.add_event_handler(Events.COMPLETED, lambda x: print("EVENT: COMPLETED"))

    child_engine.add_event_handler(Events.STARTED, lambda x: print("\tEVENT: STARTED"))
    child_engine.add_event_handler(Events.EPOCH_COMPLETED, lambda x: print("\tEVENT: EPOCH_COMPLETED"))
    child_engine.add_event_handler(Events.COMPLETED, lambda x: print("\tEVENT: COMPLETED"))

    child_child_engine.add_event_handler(Events.STARTED, lambda x: print("\t\t EVENT: STARTED"))
    child_child_engine.add_event_handler(Events.EPOCH_COMPLETED, lambda x: print("\t\t EVENT: EPOCH_COMPLETED"))
    child_child_engine.add_event_handler(Events.COMPLETED, lambda x: print("\t\t EVENT: COMPLETED"))

    parent_engine.run(dummy_data, max_epochs=2)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 7, 2020

@alxlampe thanks for details, I like what we would like to achieve and I think this can be interesting and important for other application ! I'm not 100% sure about the way to implement it and I think IterableEngine has a limitation as being bound to epochs.

Let me think about what can be done with the API

Btw, your above example does not work and says Engine run is terminating due to exception: 'NoneType' object has no attribute 'max_epochs'. My bad, I had some old ignite version.

@alxlampe
Copy link
Contributor Author

alxlampe commented Oct 16, 2020

@vfdev-5 I did some experiments with the example from my last post and there are some issues:
First, yield can not be used in an if statement:

if return_generator:
    yield self.state

This does not work as expected. Even if return_generator==False, the function returns an iterator.

Second, if I have two doubly nested engines, (two of child_child_engine in the example from above), the event STARTED is not in sync with the parent engines. This means, that the epoch of the first child_child_engine runs before the second child_child_engine fires STARTED.

EVENT: STARTED
	EVENT: STARTED
	EVENT: EPOCH_STARTED
		 EVENT: STARTED
		 EVENT: EPOCH_STARTED <- not intended*
		 EVENT: EPOCH_COMPLETED
		 EVENT: STARTED
		 EVENT: EPOCH_STARTED
		 EVENT: EPOCH_COMPLETED
	EVENT: EPOCH_COMPLETED
EVENT: EPOCH_COMPLETED
	EVENT: EPOCH_STARTED
		 EVENT: EPOCH_STARTED
		 EVENT: EPOCH_COMPLETED
		 EVENT: EPOCH_STARTED
		 EVENT: EPOCH_COMPLETED
	EVENT: EPOCH_COMPLETED
EVENT: EPOCH_COMPLETED
		 EVENT: COMPLETED
		 EVENT: COMPLETED
	EVENT: COMPLETED
EVENT: COMPLETED

*event STARTED of 2nd child_child_engine should be fired before epoch of 1st child child engine runs an epoch

I've created a gist here with an engine that implements the following methods:

  • setup_run does the setup and fires Events.STARTED
  • run_epoch run exactly one epoch and fires everthing in between Events.EPOCH_STARTED and Events.EPOCH_COMPLETED. If max_epochs is reached, it also runs finish_run.
  • finish_run runs the teardown part and fires Events.COMPLETED.

This engine is attachable or nestable with attach_to_engine. It adds event handlers to events of the parent engine, to fire one time events (STARTED, COMPLETED,...) in sync with the parent engine.
In this example, the run_epoch method is attached to parents EPOCH_COMPLETED. This enables to run the nested engines without any changes to the parent engine.

The output is the following:

EVENT: STARTED (parent_engine)
	EVENT: STARTED (nested_engine)
		EVENT: STARTED (doubly_nested_engine1)
		EVENT: STARTED (doubly_nested_engine2)
EVENT: EPOCH_STARTED (parent_engine)
EVENT: EPOCH_COMPLETED (parent_engine)
	EVENT: EPOCH_STARTED (nested_engine)
	EVENT: EPOCH_COMPLETED (nested_engine)
		EVENT: EPOCH_STARTED (doubly_nested_engine1)
		EVENT: EPOCH_COMPLETED (doubly_nested_engine1)
		EVENT: EPOCH_STARTED (doubly_nested_engine2)
		EVENT: EPOCH_COMPLETED (doubly_nested_engine2)
EVENT: EPOCH_STARTED (parent_engine)
EVENT: EPOCH_COMPLETED (parent_engine)
	EVENT: EPOCH_STARTED (nested_engine)
	EVENT: EPOCH_COMPLETED (nested_engine)
		EVENT: EPOCH_STARTED (doubly_nested_engine1)
		EVENT: EPOCH_COMPLETED (doubly_nested_engine1)
		EVENT: EPOCH_STARTED (doubly_nested_engine2)
		EVENT: EPOCH_COMPLETED (doubly_nested_engine2)
EVENT: COMPLETED (parent_engine)
	EVENT: COMPLETED (nested_engine)
		EVENT: COMPLETED (doubly_nested_engine1)
		EVENT: COMPLETED (doubly_nested_engine2)

Another case would be, if the parent engine only serves to drive the run of it's child engines. Then one could create the parent engine with a process function that runs epochs of the child engines:

def _process_child_engines(engine, child_engines):
    for child_engine in child_engines:
        child_engine.run_epoch()

This could also solve the problem in #1384 (Step 2 in the discussion). The advantage would be that child_engine can also be standalone, so that it can run training using child_engine.run(...) and without making any changes to metrics handlers, loggers etc..
In the use case of #1384, the only thing that must be added then is some kind of MetricSummaryHandler that is attached to the parent engine and summarizes the metrics of all child engines.

How could it look like:
I am not very familiar with k-fold cross validation but this seems to be a good example. The code below doesn't work but should give some intuition about the idea:

child_engines = []
for k in range(num_k):
   k_fold_data_loader = get_k_fold_data(k, num_k)
    # setup_engine is a function that sets up the training process for one engine with all it's metrics loggers etc.
    engine = setup_engine(data=k_fold_data_loader)  # function takes arguments like engine.run, stores k_fold_data_loader
    child_engines.append(engine)

serving_engine = Engine(_process_child_engines) # process function from above
# attach childs to have one time events synchronized with serving engine
for child_engine in child_engines:
    child_engine.attach_to_parent_engine(serving_engine)

# add metrics summarizer
metrics_summarizer = MetricsSummarizer(child_engines, ...)  # add childs to summarizer
metrics_summarizer.attach(serving_engine) # adds event handler to epoch_completed to summarize metrics after each epoch


serving_engine.run(data=child_engines, max_epochs=100)

Some more info about my use case:
I am using this in the field of RL, where I have multiple tasks (child engines). Each task engine runs the agent in a specific environment. E.g. in evaluation, I am running 20 different tasks (20 specific environment setups) and compute summarized metrics based on the results of 20 runs. Furthermore, I have added task groups because some tasks fall in the same category, e.g. 5 different tasks of group walking, 10 different tasks of running etc.

I hope that gives some insights and shows, that it is a useful feature :-)
One can even think of extending this to enable the child engines to run in parallel, which is what I am planning in the future for my experiments.
If you're interested, we could have some more discussion on what I've already implemented (e.g. groups and group metrics handler).

Another use case that fits nicely into this framework is to run an experiment with multiple seeds at the same time and compute metrics summary (mean, min, max) on the fly.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 16, 2020

@alxlampe thanks for the update and detailed info about your use-case ! Yes, this could be definitely an awesome and useful feature !

This does not work as expected. Even if return_generator==False, the function returns an iterator.

Yes, I also remarked that while playing around. Currently, I think to rewrite Engine as a generator and wrap it such that there is no BC break. Having a generator would be also interesting in case of Federated Learning (see #1382 ) where we need to stop at a number of iterations and not epochs...

If you're interested, we could have some more discussion on what I've already implemented (e.g. groups and group metrics handler).

Yes, this can be helpful too. Maybe we can open another FR request for that.

Let's discuss first about the Engine and what can be done. I think about two things now on how to recode Engine:

  • split Engine(Serializable) -> Engine(Serializable, EventsDriven) where EventsDriven is a class responsible for events registration, triggering etc. Thus Engine will have only the logic to register necessary events and about how to run two loops (master...vfdev-5:eventdriven)
  • second point is to change Engine to be a generator such that we can pause the Engine on any iteration we would like and resume after that.

I wonder if the second change could cover in an acceptable way your need to split the loop on epoch.
However, I agree that we can probably rename current private methods and expose them as public method with a similar logic as you proposed: setup, run epoch finish.
Then NestableEngine rewritten without duplicating the code can easily go to contrib.engines.
What do you think ?

@alxlampe
Copy link
Contributor Author

EventsDriven is exactly what I need to create custom engines 👍 I find that the event system is very powerful and easy to use. I also searched for python event frameworks in general but I didn't find one.

I think #1382 could be solved, if Engine provides engine.run_iteration that is like engine.run_epoch. They could use

for _ in range(10):
    engine.run_iteration()

Obviously, this is not a clean solution. But what do you think of something more general like engine.run_iteraions and engine.run_epochs that accept input arguments similar to engine.run:

engine.run(data, max_epochs, epoch_length, seed)
engine.run_epochs(data=None, num_epochs=1, epoch_length=None, seed=None)
engine.run_iterations(data=None, num_iterations=1, epoch_length=None, seed=None)

This would give the options, to assign a new dataset and to control how many epochs/iterations should be iterated.
Then, #1382 could be solved with

# setup from the main program
engine.setup_run()
engine.run_iterations(num_iterations=10)
# do something in the main program 
engine.run_iterations(num_iterations=10)
# ...

And a nice feature would be to create generators from that.
Or do you already have an idea how you implement generators in Engine?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 16, 2020

@alxlampe there is a sort of major issue with EventsDriven and the way we coded filtered events inside Engine.
Our current implementation is coupled with State and Events such that for filtering we fetch event counter from the State and perform the filtering. We can decouple them by introducing allowed events counters inside EventsDriven as EventsDriven._allowed_events_counts. Using this we can reintroduce events filtering inside EventsDriven. But we can now have the problem that State.iteration and State.epoch become unlinked from EventsDriven._allowed_events_counts. Certaing syncing between them is necessary, but I can't figure out a simple and non-breaking solution for that...

But what do you think of something more general like engine.run_iteraions and engine.run_epochs that accept input arguments similar to engine.run

Yes, this API is very interesting and such decoupling can provide another level of flexibility that Engine could be inserted in any kind of loops 👍 . Still the API should be discussed and all side effect to understand.

Or do you already have an idea how you implement generators in Engine?

Seems like mixing both behaviours is not that trivial. Not yet sure, but doing something like that can work out maybe

as_generator = True

def bar(i):
    if as_generator:
        yield None
    return None

def foo(n):
    print("start")
    for i in range(n):
        print("-", i)
        if as_generator:
            yield from bar(i)
        else:
            bar(i)
        print("-- ")
    print("final return")
    return i

Main point is that we could yield from any attached handler. Same here, we have to measure all positive and negative impact of that...

@vfdev-5 vfdev-5 added the module: engine Engine module label Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants