-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade from r11 to r12 prodeuces "Variables not defined" when using any optimizer but GradientDescentOptimizer #6220
Comments
Comment and uncomment the different optimizers to see the behavior. As explained, the only work working is GradientDescentOptimizer. This behavior does not occur in version r11 and it does not happen either if the averaging is not performed. Any clue? |
In particular, this commit causes the problem: 0fc86dd. |
Please, tell me if I can help. |
You can revert the changes to slot_creater.py or fix the changes and send a PR. Thanks. |
Sorry sherry -- the current behaviour is correct. Your code is leaking reuse -- it just wasn't checked before. It could cause all other troubles, and I think we should correct the leaky reuse cases, not revert the slot change. I'll write more on the test cases, closing this. |
Then, just to be clear. How do we get the desired results? Does the reuse need to be done in a different way? This has been directly taken from the cifar10 multi-gpu example. Thanks |
To clarify, we just need to put a scope around the model-construction part.
Hope that helps! |
Thanks lukaszkaiser. It works perfectly fine now! |
@lukaszkaiser Hello, I found that your workaround to put a variable_scope which wraps the outermost num_gpus loop, but I am still confused why it does eliminate the error. with tf.variable_scope(tf.get_variable_scope()) as vscope:
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
loss = tower_loss(scope)
tf.get_variable_scope().reuse_variables() # HERE Is it just because that the What do you mean by "leaky reuse"? Could you please clarify me? |
Sure, let me try to clarify. When you do Hope that helps, let me know if I should clarify more. |
Ah, great. Your explanation is clear and helpful. Thanks! To sum, a thing to remember is that where the (Adam-like) optimizer acts, i.e. |
Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables could not be created. This is because of the 'leaky reuse' of variable scope, so we correct the problem by explicitly introducing a new variable scope. Related issues: tensorflow#901, tensorflow/tensorflow#6220
Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables could not be created. This is because of the 'leaky reuse' of variable scope, so we correct the problem by explicitly introducing a new variable scope. Related issues: tensorflow#901, tensorflow/tensorflow#6220
* Fix bug in relative path of shell scripts built with bazel. * Add Bazel workspace name to fix bug in relative path of shell scripts. * Update citation in README.md * Revert "Add Bazel workspace name to fix bug in relative path of shell scripts." This reverts commit a704458. * Revert "Fix bug in relative path of shell scripts built with bazel." This reverts commit 091d6e4. * Add Bazel workspace name to fix bug in relative path of shell scripts. * Fix a bug in the im2txt code where the Saver is created before the optimizer. * Fix bug caused by signature change of resize_images(). * fix resize image throughout * Remove flag --config=cuda. It's not necessary and can cause a warning. * Close the TFRecordWriter after use. * Use tar on OSX to unzip the MSCOCO data file. * Use open() instead of tf.gfile.FastGFile() * Updates to syntaxnet, including update tensorflow sub-module, bazel requirement and fix trainer crash (tensorflow#479) * syntaxnet: Cosmetic fixes recommended by python lint. * syntaxnet: Fix crash in parser_trainer due to inconsistency between LexiconBuilder::Compute() and context.pbtxt definition ('char-map' input declaration was missing). * syntaxnet: reduce flakiness in GraphBuilderTest. * syntaxnet: Update tensorflow submodule to version > 0.10. * syntaxnet: Update to latest stable bazel (0.3.1). This update comes partially to allow Tensorflow submodule to build succesffuly. In this commit, I also update and simplify the WORKSPACE, to avoid declaring dependencies already present in tensorflow. * syntaxnet: Update bazel version check to require version 0.3.0 * syntaxnet: Document pip requirement, along with python mock module. * added python3 support to read_label_file * Fix GFile issue with numpy by using io library. * video prediction model code * Added STREET model for FSNS dataset * Fix broken link in inception readme Fixed tensorflow#529 * Revert "Use open() instead of tf.gfile.FastGFile()" This reverts commit c6a4f78. Fixed tensorflow/tensorflow#4981 * Fix comment of parameter "output_codes" * Add sys.stdout.flush() * fix end point collection to return a dict * Fix POS tagging score of Ling et al.(2005) For English News Corpus, [Ling et al. (2015)](http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf)'s score is 97.78 -> 97.44 (lower than SyntaxNet and Parsey Mcparseface) according to [Andor et al. (2016)](http://arxiv.org/abs/1603.06042). * add privacy analysis script and teacher labels required to predict the epsilon * remove CIFAR-10 from README * Add differential privacy training. * fix module object has no attribute NodeDef for tensorflow 0.11 (tensorflow#572) * fix module object has no attribute NodeDef for tensorflow 0.11 * change graph_pb2.NodeDef to tf.NodeDef * Update cifar input following data change. * Allow softplacement for ResNet * doc typo * Explicitly set state_is_tuple=False. * make large files downloadable * removed large binaries from this repository * added description of binary files in privacy README.md * typo in privacy README * remove extra parentheses in privacy README * Updated download instructions to match reality * Consolidate privacy/ and differential_privacy/. * Fix the BUILD file * Remove privacy/ after consolidation. Now differential_privacy and privacy are under the same project. * val_captions_file -> captions_val2014.json * Remove comment that TensorFlow must be built from source. * Implementation of Inception V4 * Update README with results for comparison. * added semi-supervised training of the student using improved-gan (tensorflow#655) * Updating README.md Adding list of maintainers Changing model links to point to tensorflow/models repository. * fix the readme * fix the readme * My message * move to a new place * add a readme * Get back the README * Get back the README * edits ro README * edits to README * edits to README * Update GraphKeys.VARIABLES to GraphKeys.GLOBAL_VARIABLES * Update README.md Fixed typos in folders pathes * Update GraphKeys.VARIABLES to GraphKeys.GLOBAL_VARIABLES. * Raises AssertionError on Incomplete Vocabulary fixes issue tensorflow#621 added a new function CheckVocab, to check for presence of a word in vocabulary * Update data.py * Convert resnet model to use monitored_session * Moving example models from github.com/tensorflow/tensorflow to github.com/tensorflow/models * Python 3 support for some inception scripts * Made several fixes to the embedding README * fix the error of "TypeError: ones_initializer() got multiple values for (tensorflow#777) keyword argument 'dtype'". * Update cifar10.py bug fix for contrib.deprecated eliminatation in tf version 12. * Update cifar10_input.py bug fix for contrib.deprecated eliminatation in tf version 12. * Update cifar10_multi_gpu_train.py bug fix for contrib.deprecated eliminatation in tf version 12. * Update word2vec.py bug fix for contrib.deprecated eliminatation in tf version 12. * Update ptb_word_lm.py bug fix for contrib.deprecated eliminatation in tf version 12. * Add cross conv model for next frame prediction. * Remove all references to 'tensorflow.models' which is no longer correct * fix neural programmer link error in README.md * Update README.md * DOC: Typo in resnet documentation "resisual" => "residual" * Removed unused import * Re-alphabetized the README * Word2vec can now be run if the users compile the ops on their own * Add a link to explain the compilation command * Replaced direct path concatenation with os.path.join * Wording change * Fix rnn translate in python3 * Update build_image_data.py _bytes_feature excepted class bytes, but in python3 in class str,so use tf.compat.as_bytes for compatibility ! * im2txt: make python3 compatible adding lt and eq __cmp__ is deprecated on python3, so it fails to compare class Caption on python3 * Ability to train the translation model on arbitrary input sources. * slim: Typos at datasets/flowers.py * Added README to tutorials/ recommending to the user to install TensorFlow from source * Change installing from source to installing from nightly build * Deleted embedding/BUILD which is no longer working (tensorflow#855) * Fix xent call in mnist tutorial code Fixes tensorflow#857. * Update cluttered_mnist.py * Update losses.py * Update cifar10.py * Update deep_cnn.py * Update vgsl_model.py * Updated calls to '..._cross_entropy_with_logits' in order to match internal version * Added -D_GLIBCXX_USE_CXX11_ABI=0 to support g++ version 5 for word2vec * Moved parenthesis to the right place * Replace deprecated functions * Replace deprecated functions * Update deprecated function Update based on the error message: WARNING:tensorflow:From ./neural_programmer/parameters.py:75 in parameters.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. * Replace deprecated functions * Replace deprecated functions * Update README.md to indicate required TensorFlow version. * ensure output directory exists The neural programmer model fails the first time it's run, if the output directory folder does not already exist. In this case "../model" does not exist and the function fails because the mkdir function doesn't appear to create parent folders. Error: tensorflow.python.framework.errors_impl.NotFoundError: ../model//modeltemp/ * Variables defined in ExponentialMovingAverage need not to be shared. (tensorflow#778) * Variables defined in ExponentialMovingAverage need not to be shared. * Address comments. * Real NVP code * Make comment formal * Update the tensorflow submodule in syntaxnet in order to fix the zlib URL * Added shape to cifar10_input.py Fixes tensorflow#893 * Upgrade Bazel in syntaxnet Dockerfile * Remove dated Bazel docs in syntaxnet (tensorflow#905) Fixes tensorflow#657 * Fix typos in models/slim/README.md (tensorflow#904) Fixed tensorflow#903 * Update resnet to run with tf r0.12 API. (tensorflow#833) * Update resnet to run with tf r0.12 API. 1. tf.image.per_image_whitening -> tf.image.per_image_standardization 2. Use tf.summary to replace tf.image_summary, tf.scalar_summary, tf.merge_all_summaries. * remove log * Update the embedding README to be compatible with Mac * update the initializer changes * update another initializer change * Force new instance creation in MultiRNNCell (See also CL 145094809) * Fix regressions caused by a previous change * Update inception model based on tf API changes: replace tf.op_scope with tf.name_scope and tf.variable_op_scope with tf.variable_scope; fix the order of arguments for tf.concat; replace tf.mul with tf.multiply. * Modify compression tools to be Python3 compatible. * Fix vocabulary naming (input/output vocabulary no longer has same name) (tensorflow#946) * Updated the cifar10 model to match the internal version and to be compatible with the latest version of TensorFlow * Sync w TF r0.12 & Bazel 0.4.3, internal updates (tensorflow#953) * Update to the Neural GPU. * Changes for TF 1.0 compatibility * another xrange change + change to concat_v2 * Corrections and explanations for the updated Neural GPU model. * Update tf.concat_v2 to tf.concat * Removed deprecated op Remove the deprecated `scalar_summary` and use `summary.scalar` instead. The current program gets the following warning: WARNING:tensorflow: build_graph.: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30. Instructions for updating: Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported. * typo * Updated summaries in the tutorial models to 1.0 * Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables could not be created. This is because of the 'leaky reuse' of variable scope, so we correct the problem by explicitly introducing a new variable scope. Related issues: tensorflow#901, tensorflow/tensorflow#6220 * Update concat_v2 to be concat to match 1.0 final Fixes tensorflow#1014 * Updated concat_v2 to concat for 1.0 compatibility Updated concat_v2 to concat for version 1.0 compatibility for breaking changes introduced in version 1.0 "tf.concat now takes arguments in reversed order and with different keywords. In particular we now match NumPy order as tf.concat(values, axis, name)" * Update resnet model API + README * Update the evaluation code as well to print results * Remove the specific timing from the README * Update swivel to TFr1.0 - TF1.0 has breaking changes for tf.concat - Replace deprecated summary api - Replace to be deprecated initialize_all_variables * Fixed concat order using tf_upgrade.py * Changed deprecated tf.initialize_all_variables() to tf.global_variables_initializer() * Fix division changing dtype to float in python3 * Make slim models a python package * Set tf.logging verbosity to INFO * Modify the README to reflect changes * Sync SyntaxNet with TensorFlow r1.0 (tensorflow#1062) * Sync SyntaxNet with TensorFlow r1.0 * Fix typo back * Fix Dockerfile to match TensorFlow 1.0 * Fix Bazel version check (tensorflow#1069)
@wagonhelm hello,when use Adam,how do you solve it?please tell me the details,thank you |
Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables could not be created. This is because of the 'leaky reuse' of variable scope, so we correct the problem by explicitly introducing a new variable scope. Related issues: tensorflow/models#901, tensorflow/tensorflow#6220
Find below a minimal version that causes the error:
The error looks like this:
tensorflow version
|
@lukaszkaiser Hi, I've confronted with this problem when using AdamOptimizer, I've tried your suggestion but it still doesn't work. Could you please help me change the code?
|
@Huayra007
should be able to run it. You are calling 2 time your generator. So or you remove the snippet or you add reuse to your generator code as such:
and when you call it for the tensorboard as such:
I hope this helps. Good luck with your GANs ;-) |
i am a student,i am not very familiar with tensorflow, i just follow @lukaszkaiser |
I'm new in TF, I tried to use: |
@lukaszkaiser Hello I need your help :( , in the code i have the same error :ValueError: Variable G_fc/w does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope? this is the function
And this is the G_fc function def fc(input_vector, num_output_length, name='fc'): |
i got the same error in multi-gpu training script, even when i include all the model define inside a |
Just do as @wookayin suggested |
@Traeyee thanks, fixed the problem after many attempts of changing tf.name_scope and tf.variable_scope. |
Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables could not be created. This is because of the 'leaky reuse' of variable scope, so we correct the problem by explicitly introducing a new variable scope. Related issues: tensorflow/models#901, tensorflow/tensorflow#6220
Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables could not be created. This is because of the 'leaky reuse' of variable scope, so we correct the problem by explicitly introducing a new variable scope. Related issues: tensorflow/models#901, tensorflow/tensorflow#6220
After a recent upgrade to the latest version of tensorflow in github, several things stop working. I found out that all the optimizers, such as Adam or Adagrad are now producing an error related to variable scope that I have not managed to solve yet. However, GradientDescentOptimizer works fine.
It may be related to the issue: #5652
The error looks like this:
It works fine with tensorflow r11
Operating System: Ubuntu 16 and Ubuntu 14
Installed version of CUDA and cuDNN: cuda 8.0, cuda 5.1
cuda.txt
The commit hash 6dc8dea
Build time: Wed Nov 2 17:54:14 2016 (1478109254)
Build timestamp: 1478109254
Build timestamp as int: 1478109254
Find below a minimal version that causes the error:
The text was updated successfully, but these errors were encountered: