Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate feature extractor networks for DQN networks #132

Merged
merged 2 commits into from
Jul 30, 2020

Conversation

Miffyli
Copy link
Collaborator

@Miffyli Miffyli commented Jul 30, 2020

Closes #131

Description

DQN main and target network accidentally shared feature extractor network. This fix creates new feature extractor each time new QNetwork is made.

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

@Miffyli Miffyli requested a review from araffin July 30, 2020 18:28
@araffin
Copy link
Member

araffin commented Jul 30, 2020

Nice catch, but it was working before?

I think we may have the issue with SAC/TD3 too then.

@Miffyli
Copy link
Collaborator Author

Miffyli commented Jul 30, 2020

Nice catch, but it was working before?

I do not see how it could have worked before, unless somewhere along the line the models got cloned or the references to data they use were separated (not sure how that could happen, either...).

As for SAC/TD3, I heard they share the feature extractor part but this is not updated as part of target-update, right? In DQN you update the whole network, and thus you have full copies of both.

Edit: No wait, we might have the same issue, as it seems like whole networks are being updated. Those will require more careful testing to see what is right for SAC/TD3.

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch =)

Do you have an easy way of testing that? Otherwise, I will merge as soon CI passes.

For SAC/TD3, we will see later, as I would like to share the feature extractor between actor and critic (otherwise it is super slow, and even when sharing, it is slow)

@m-rph
Copy link
Contributor

m-rph commented Jul 30, 2020 via email

@araffin
Copy link
Member

araffin commented Jul 30, 2020

If the extractor isn’t flatten, then using the “is” operator with the .parameters() function should catch it.

Ok, I think I will delay the test to SAC/TD3 fix, as this is a hotfix for DQN.

@araffin araffin merged commit 77cb3dd into master Jul 30, 2020
@araffin araffin deleted the fix/dqn-feature-extractor branch July 30, 2020 18:48
@araffin araffin mentioned this pull request Jul 30, 2020
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Optimized Polyak update not equivalent for CnnPolicy
3 participants