Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent tf.Session() from eating up all the GPU memory #3219

Merged
merged 3 commits into from
Jan 15, 2020

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented Jan 14, 2020

By default, a tf.Session will take up all GPU memory. In the Policy, we combat this by enabling allow_growth = True and enabling soft placement (for multi-GPU).

But we introduced another session to write hyperparameters. In the past this session was created after the policy, and respected those configs. However, now it is created before and needs its own config options.

@@ -70,7 +70,11 @@ def write_tensorboard_text(self, key: str, input_dict: Dict[str, Any]) -> None:
:param input_dict: A dictionary that will be displayed in a table on Tensorboard.
"""
try:
with tf.Session() as sess:
# Prevent GPU memory from being eaten up by this small writing session
config = tf.ConfigProto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the ConfigProto setup to a utility function? I'm ok with not doing it now, but if we need it in a 3rd place, I think we should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put it in tf_utils, that seems like the natural place for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to tf in tf_utils - lmk how it looks.

Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, do we need to consider this for a hotfix release?

@ervteng
Copy link
Contributor Author

ervteng commented Jan 14, 2020

Looks good, do we need to consider this for a hotfix release?

I think so - this and the Tensorboard fix should probably be a hotfix for 0.13. What do you think?

Copy link
Contributor

@andrewcoh andrewcoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks good to me. Should I cherry pick this into the release?

@ervteng ervteng merged commit cd29e95 into hotfix-0.13.1 Jan 15, 2020
@delete-merged-branch delete-merged-branch bot deleted the hotfix-tensorflowgpu branch January 15, 2020 01:35
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants