Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use pretrained model #7

Open
justaduck opened this issue Oct 27, 2020 · 1 comment
Open

Unable to use pretrained model #7

justaduck opened this issue Oct 27, 2020 · 1 comment

Comments

@justaduck
Copy link

When attempting to do a dry run of the network using the pretrained model checkpoint from Google Drive I am met with the traceback below. I am using Tensorflow 1.4 and Python 3.4 on Ubuntu 18.04.

Command: python3.4 dry_run.py --model_weights='./ckpt/model.ckpt-pretrained' --vis=True --dataset='single_img' --save-dir='./output'

WARNING:tensorflow:From dry_run.py:300: calling argmax (from tensorflow.python.ops.math_ops) with dimension is deprecated and will be removed in a future version.
Instructions for updating:
Use the `axis` argument instead
2020-10-27 22:17:21.911705: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "discriminator/conv1_batchnorm1/pop_var" not found in checkpoint files ./ckpt/model.ckpt-pretrained
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dry_run.py", line 443, in <module>
    main()
  File "dry_run.py", line 325, in main
    load(loader, sess, args.model_weights)
  File "dry_run.py", line 89, in load
    saver.restore(sess, ckpt_path)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1666, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "discriminator/conv1_batchnorm1/pop_var" not found in checkpoint files ./ckpt/model.ckpt-pretrained
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

Caused by op 'save/RestoreV2_6', defined at:
  File "dry_run.py", line 443, in <module>
    main()
  File "dry_run.py", line 324, in main
    loader = tf.train.Saver(var_list=restore_var_2)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1218, in __init__
    self.build()
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1227, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1263, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 267, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Tensor name "discriminator/conv1_batchnorm1/pop_var" not found in checkpoint files ./ckpt/model.ckpt-pretrained
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

It is worth noting that the tensor not being found is not always the same. I have also gotten this error for "discriminator/conv1_1/b", "discriminator/conv5_batchnorm1/scale", "discriminator/fc9_voc12/filter", and "discriminator/conv5_batchnorm2/pop_mean" to name a few.

@pengzhou1108
Copy link
Owner

When attempting to do a dry run of the network using the pretrained model checkpoint from Google Drive I am met with the traceback below. I am using Tensorflow 1.4 and Python 3.4 on Ubuntu 18.04.

Command: python3.4 dry_run.py --model_weights='./ckpt/model.ckpt-pretrained' --vis=True --dataset='single_img' --save-dir='./output'

WARNING:tensorflow:From dry_run.py:300: calling argmax (from tensorflow.python.ops.math_ops) with dimension is deprecated and will be removed in a future version.
Instructions for updating:
Use the `axis` argument instead
2020-10-27 22:17:21.911705: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "discriminator/conv1_batchnorm1/pop_var" not found in checkpoint files ./ckpt/model.ckpt-pretrained
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dry_run.py", line 443, in <module>
    main()
  File "dry_run.py", line 325, in main
    load(loader, sess, args.model_weights)
  File "dry_run.py", line 89, in load
    saver.restore(sess, ckpt_path)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1666, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "discriminator/conv1_batchnorm1/pop_var" not found in checkpoint files ./ckpt/model.ckpt-pretrained
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

Caused by op 'save/RestoreV2_6', defined at:
  File "dry_run.py", line 443, in <module>
    main()
  File "dry_run.py", line 324, in main
    loader = tf.train.Saver(var_list=restore_var_2)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1218, in __init__
    self.build()
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1227, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1263, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 267, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Tensor name "discriminator/conv1_batchnorm1/pop_var" not found in checkpoint files ./ckpt/model.ckpt-pretrained
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

It is worth noting that the tensor not being found is not always the same. I have also gotten this error for "discriminator/conv1_1/b", "discriminator/conv5_batchnorm1/scale", "discriminator/fc9_voc12/filter", and "discriminator/conv5_batchnorm2/pop_mean" to name a few.

The model.ckpt-pretrained you are referring is only for deeplab checkpoint. To dry run the model, you might need to train GSR-Net and use it's checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants