Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Saving and loading the DQN agent would not save/load four needed attributes:
This caused the agent to have different a performance when evaluated without killing the program vs when saving the agent, killing the program, resuming the program and loading the agent.
Fig 1 - Training without checkpoints (i.e. same program ran from start to finish)
Fig 2 - Training with checkpoint (i.e., program killed at every t steps and agents loaded from disk)
My proposed solution (working, but applied only to the DQN agent) was to add new save_snapshot and load_snapshot methods on the agent's class (without overwriting the original save and load methods, avoiding saving the replay buffer every time):
This change is working as intended, training is resumed properly after reloading the agent from disk:
Fig 3 - Training with checkpoint (New patch) (i.e., program killed at every t steps and agents loaded from disk)