-
Notifications
You must be signed in to change notification settings - Fork 3.6k
update multistep_optimizer for tensorflow gpu #1773
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
Hi @AgoloCuongHoang -- Thanks for the nice investigation, but Travis seems to report that the |
To make Travis trigger - I will close and reopen the pull request. @lukaszkaiser - Do you have thoughts on this? I think changing |
It seems like Travis did run and fail - https://travis-ci.org/tensorflow/tensor2tensor/jobs/627551248?utm_medium=notification&utm_source=github_status So there is no need to do the close and reopen dance. |
@afrozenator: Sorry for my late response. I have some important work to do recently. I just fixed the file and I think it passed the test (multistep_optimizer_test.py). It however does not pass some certain task that I indeed have no ideas why it is (I never touched on them I believe). |
Please see my latest commit which fixes the issue and passes the multistep_optimizer_test.py |
@afrozenator: Any update on this? To be clear I am OK if the pull request reject was rejected. I am just curious what was going on. Thx. |
@AgoloCuongHoang : could you make this new optimizer in a separate file and in a separate class? Just so we have the old one too for compatibility with old code? |
@lukaszkaiser: Done. Let me know if you need me to do any thing further |
Thanks a lot @AgoloCuongHoang merging this now! |
PiperOrigin-RevId: 316746422
I found by using the original class
MultistepAdamOptimizer
, I got a problem ofinvalidArgumentError: Cannot assign a device for operation
(see details below). This problem is only for tensorflow-gpu. On tensorflow it works fine. I did a lot of investigation on this (e.g. try adding normal soft placementtf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True
)) - the trick people recommended without success).After investigation I found two actions needed to fix this issue. 1. Convert int to float, and 2. The class inherits directly from
optimizer.Optimizer
, notAdamOptimizer
. With this I found it solved the issue. Note that without any of these twos it will not work.Let me know if you have any question regarding to this pull request. Meanwhile let me know if you need the code to replicate the issue of invalidArgumentError. Feel free if you think you could find a better solution.
Finally, I tagged @fstahlberg as well as he wrote the original
MultistepAdamOptimizer
class so that he will aware of this issue.Thank you!
A subpiece of the error: