Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing engine terminate behaviour when resumed #2678

Merged
merged 1 commit into from
Aug 29, 2022

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Aug 29, 2022

Related to #2669

Description:

  • Quick fix for engine's terminate behaviour when we resume the run
from ignite.engine import Engine, Events
from ignite.utils import setup_logger, logging


def func(engine, batch):
    print(engine.state.epoch, engine.state.iteration, " | ", batch)


max_epochs = 4
data = range(10)
engine = Engine(func)
engine.logger = setup_logger("engine", level=logging.DEBUG)


@engine.on(Events.ITERATION_COMPLETED(once=14))
def terminate():
    print("-> terminate")
    engine.terminate()

engine.run(data, max_epochs=max_epochs)
engine.run(data, max_epochs=max_epochs)
print(engine.state.epoch, engine.state.iteration)

Output:

2022-08-29 12:21:41,270 engine DEBUG: Added handler for event Events.ITERATION_COMPLETED(filter=<function CallableEventWithFilter.once_event_filter.<locals>.wrapper at 0x1119e39d0>)
2022-08-29 12:21:41,271 engine INFO: Engine run starting with max_epochs=4.
2022-08-29 12:21:41,272 engine DEBUG: 0 | 0, Firing handlers for event Events.STARTED
2022-08-29 12:21:41,272 engine DEBUG: 1 | 0, Firing handlers for event Events.EPOCH_STARTED
2022-08-29 12:21:41,273 engine DEBUG: 1 | 0, Firing handlers for event Events.GET_BATCH_STARTED
...
2022-08-29 12:21:41,336 engine DEBUG: 2 | 14, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,339 engine DEBUG: 2 | 14, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,341 engine INFO: Terminate signaled. Engine will stop after current iteration is finished.
2022-08-29 12:21:41,342 engine DEBUG: 2 | 14, Firing handlers for event Events.TERMINATE
2022-08-29 12:21:41,344 engine DEBUG: 2 | 14, Firing handlers for event Events.COMPLETED
2022-08-29 12:21:41,347 engine INFO: Engine run complete. Time taken: 00:00:00.075
2022-08-29 12:21:41,348 engine INFO: Engine run resuming from iteration 14, epoch 2 until 4 epochs
2022-08-29 12:21:41,350 engine DEBUG: 2 | 14, Firing handlers for event Events.STARTED
2022-08-29 12:21:41,351 engine DEBUG: 3 | 14, Firing handlers for event Events.EPOCH_STARTED
2022-08-29 12:21:41,353 engine DEBUG: 3 | 14, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,354 engine DEBUG: 3 | 14, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-29 12:21:41,354 engine DEBUG: 3 | 15, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,356 engine DEBUG: 3 | 15, Firing handlers for event Events.ITERATION_COMPLETED
...
2022-08-29 12:21:41,380 engine DEBUG: 3 | 22, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,381 engine DEBUG: 3 | 22, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,382 engine DEBUG: 3 | 22, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,384 engine DEBUG: 3 | 22, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-29 12:21:41,384 engine DEBUG: 3 | 23, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,386 engine DEBUG: 3 | 23, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,388 engine DEBUG: 3 | 23, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,389 engine DEBUG: 3 | 23, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-29 12:21:41,391 engine DEBUG: 3 | 24, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,392 engine DEBUG: 3 | 24, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,394 engine DEBUG: 3 | 24, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-29 12:21:41,395 engine INFO: Epoch[3] Complete. Time taken: 00:00:00.044
2022-08-29 12:21:41,396 engine DEBUG: 4 | 24, Firing handlers for event Events.EPOCH_STARTED
2022-08-29 12:21:41,397 engine DEBUG: 4 | 24, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,397 engine DEBUG: 4 | 24, Firing handlers for event Events.DATALOADER_STOP_ITERATION
2022-08-29 12:21:41,398 engine DEBUG: 4 | 24, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-29 12:21:41,400 engine DEBUG: 4 | 25, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,401 engine DEBUG: 4 | 25, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,401 engine DEBUG: 4 | 25, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,402 engine DEBUG: 4 | 25, Firing handlers for event Events.GET_BATCH_COMPLETED
...
2022-08-29 12:21:41,421 engine DEBUG: 4 | 32, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,422 engine DEBUG: 4 | 32, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-29 12:21:41,423 engine DEBUG: 4 | 33, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,424 engine DEBUG: 4 | 33, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,425 engine DEBUG: 4 | 33, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-29 12:21:41,426 engine DEBUG: 4 | 33, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-29 12:21:41,427 engine DEBUG: 4 | 34, Firing handlers for event Events.ITERATION_STARTED
2022-08-29 12:21:41,427 engine DEBUG: 4 | 34, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-29 12:21:41,429 engine DEBUG: 4 | 34, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-29 12:21:41,430 engine INFO: Epoch[4] Complete. Time taken: 00:00:00.034
2022-08-29 12:21:41,430 engine DEBUG: 4 | 34, Firing handlers for event Events.COMPLETED
2022-08-29 12:21:41,431 engine INFO: Engine run complete. Time taken: 00:00:00.081

1 1  |  0
1 2  |  1
1 3  |  2
1 4  |  3
1 5  |  4
1 6  |  5
1 7  |  6
1 8  |  7
1 9  |  8
1 10  |  9
2 11  |  0
2 12  |  1
2 13  |  2
2 14  |  3
-> terminate
3 15  |  0
3 16  |  1
3 17  |  2
3 18  |  3
3 19  |  4
3 20  |  5
3 21  |  6
3 22  |  7
3 23  |  8
3 24  |  9
4 25  |  0
4 26  |  1
4 27  |  2
4 28  |  3
4 29  |  4
4 30  |  5
4 31  |  6
4 32  |  7
4 33  |  8
4 34  |  9
4 34

@github-actions github-actions bot added the module: engine Engine module label Aug 29, 2022
@vfdev-5 vfdev-5 merged commit 26f7cec into pytorch:master Aug 29, 2022
@vfdev-5 vfdev-5 deleted the fix-engine-terminate branch August 29, 2022 12:08
@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Aug 29, 2022

If user terminate, save engine, reload it and resume. Since self.should_terminate is not persisted, engine resumes from state.iteration % epoch_length not zero.

Sadra's comment on discord about this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: engine Engine module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants