Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error First step cannot be zero when running train.py #51

Open
blockhunts opened this issue May 28, 2018 · 20 comments
Open

error First step cannot be zero when running train.py #51

blockhunts opened this issue May 28, 2018 · 20 comments

Comments

@blockhunts
Copy link

i tried to use the same images (card) provided, i just delete all the processed file (csv,dll) and follow all the step.
And when i tried to issue python train.py
I got this error

Traceback (most recent call last):
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\MRCPP-Fablab\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "E:\tensor\models\research\object_detection\trainer.py", line 288, in train
    train_config.optimizer)
  File "E:\tensor\models\research\object_detection\builders\optimizer_builder.py", line 50, in build
    learning_rate = _create_learning_rate(config.learning_rate)
  File "E:\tensor\models\research\object_detection\builders\optimizer_builder.py", line 109, in _create_learning_rate
    learning_rate_sequence, config.warmup)
  File "E:\tensor\models\research\object_detection\utils\learning_schedules.py", line 156, in manual_stepping
    raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.

Any clues why this happen?

@Surasi-Jui
Copy link

I have the same error.Do you find how to solve it?

@blockhunts
Copy link
Author

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }

@Surasi-Jui
Copy link

Surasi-Jui commented Jun 4, 2018 via email

@leccyril
Copy link

if you download model from the github repository files are up to date

@jim-meyer
Copy link

I ran into this same error while using the AWS DL AMI (Deep Learning AMI (Ubuntu) Version 10.0 (ami-23c4fb46)) and following, as far as I can tell, the same steps I used on Windows with obvious substitutions since this AMI is Ubuntu. Both Ubuntu and Windows are using TF 1.8. But when I use the train_config that blockhunts mentioned I get:
Traceback (most recent call last):
File "/ml/models/research/object_detection/train.py", line 184, in
tf.app.run()
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/ml/models/research/object_detection/train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/ml/models/research/object_detection/trainer.py", line 298, in train
train_config.optimizer)
File "/ml/models/research/object_detection/builders/optimizer_builder.py", line 50, in build
learning_rate = _create_learning_rate(config.learning_rate)
File "/ml/models/research/object_detection/builders/optimizer_builder.py", line 109, in _create_learning_rate
learning_rate_sequence, config.warmup)
File "/ml/models/research/object_detection/utils/learning_schedules.py", line 169, in manual_stepping
[0] * num_boundaries))
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2681, in where
return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6699, in select
"Select", condition=condition, t=x, e=y, name=name)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 528, in _apply_op_helper
(input_name, err))
ValueError: Tried to convert 't' to a tensor and failed. Error: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].

Any ideas?

@jim-meyer
Copy link

I see that epratheeban has the solution to my problem mentioned here #11:

It's easy. Go to the utils folder. Find the learning_schedules.py file. Go to the line 167. And replace the line 167 with below

rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),
list(range(num_boundaries)),
[0] * num_boundaries))

@aghapesar1374
Copy link

Hi @jim-meyer
I make this change and the problem solved but now returned this error

WARNING:tensorflow:From C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-pack
ages\object_detection-0.1-py3.5.egg\object_detection\core\losses.py:317: softmax
_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and
will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow
python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\trainer.py", line 288, in train
train_config.optimizer)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\builders\optimizer_builder.py", line 50, in
build
learning_rate = _create_learning_rate(config.learning_rate)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\builders\optimizer_builder.py", line 109, i
n _create_learning_rate
learning_rate_sequence, config.warmup)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\utils\learning_schedules.py", line 168, in
manual_stepping
list(num_boundaries),
TypeError: 'int' object is not iterable

@tamizharasank
Copy link

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

@leccyril
Copy link

@tamizharasank what file ? this kind of error copy it in google you will find the fix easily

@Adibhatt95
Copy link

@tamizharasank did you solve this error? I got the same error, any suggesstions?

@Kkaranmore
Copy link

After making changes in configure file in training folder I got this error:

(tensorflow1) C:\tensorflow1\models\research\object_detection>python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\predictors\heads\box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\core\losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\ops\gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2236: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\util\deprecation.py", line 272, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\trainer.py", line 397, in train
include_global_step=False))
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\variables_helper.py", line 126, in get_variables_available_in_checkpoint
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 306, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt: Not found: FindFirstFile failed for: C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28 : The system cannot find the path specified.
; No such process

@jim-meyer
Copy link

Looks like you probably did not follow all of the steps in 2a, "Download TensorFlow Object Detection API repository from GitHub" and/or 2b, "Download the Faster-RCNN-Inception-V2-COCO model from TensorFlow's model zoo". Try following those steps again exactly and that should fix your problem.

@mohamedelsiesyibra
Copy link

File "C:\tensorflow1\models\research\object_detection\utils\learning_schedules.py", line 160, in manual_stepping
raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.

i edit the file and save it and when i train it again it's return to it's original value

@bebop-boop
Copy link

bebop-boop commented Jun 1, 2019

I'm getting below error while i was trying to run:
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

WARNING:tensorflow:From C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
WARNING:tensorflow:From C:\Tensorflow\models\research\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:From C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.

Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Tensorflow\models\research\object_detection\legacy\trainer.py", line 280, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "C:\Tensorflow\models\research\object_detection\legacy\trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "train.py", line 121, in get_next
dataset_builder.build(config)).get_next()
File "C:\Tensorflow\models\research\object_detection\builders\dataset_builder.py", line 124, in build
num_additional_channels=input_reader_config.num_additional_channels)
File "C:\Tensorflow\models\research\object_detection\data_decoders\tf_example_decoder.py", line 307, in init
default_value=''),
File "C:\Tensorflow\models\research\object_detection\data_decoders\tf_example_decoder.py", line 59, in init
label_map_proto_file, use_display_name=False)
File "C:\Tensorflow\models\research\object_detection\utils\label_map_util.py", line 164, in get_label_map_dict
label_map = load_labelmap(label_map_path)
File "C:\Tensorflow\models\research\object_detection\utils\label_map_util.py", line 133, in load_labelmap
label_map_string = fid.read()
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 125, in read
self._preread_check()
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 85, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: NewRandomAccessFile failed to Create/Open: C:\Tensorflow\workspace raining_demonnotations/label_map.pbtxt : The filename, directory name, or volume label syntax is incorrect.
; Unknown error

@jim-meyer
Copy link

@ShubhranshuMaurya that error seems to indicate that there is something wrong with C:\Tensorflow\workspace raining_demonnotations/label_map.pbtxt. Have you opened that file in a text editor to see if it looks right? That file file should look something like this:
item {
name: 'Class1'
id: 1
display_name: 'Class1 Label Name'
}

item {
name: 'Class2'
id: 2
display_name: 'Class2 Label Name'
}

IIRC this file could also be a binary protobuf file in which case viewing it in a text editor won't tell you much. But if it appears to be binary perhaps you could try creating a text version with your training labels and see if that works.

@bharath5673
Copy link

#tessor flow custom training

ERROR:raise ValueError('First step cannot be zero.') ValueError: First step cannot be zero.

SOLUTION: object_detection\training\ .config

train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}

@Arri
Copy link

Arri commented Aug 30, 2019

For me it worked with 'step: 1'
for some reason there was 'step: 0'...

@siddas27
Copy link

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

Did you find a solution?

@dpbnasika
Copy link

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }

can you explain what is happening in learning rate?, what does the both step size signify in manual learning rate and also what is initial learning rate?

@EMRYLMZ1
Copy link

python train.py --logtostderr -train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v2_quantized_300x300_coco.config

Current thread 0x00005734 (most recent call first):
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 84 in _preread_check
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 122 in read
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\label_map_util.py", line 168 in load_labelmap
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\label_map_util.py", line 201 in get_label_map_dict
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 93 in init
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 460 in init
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\decoder_builder.py", line 63 in build
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py", line 209 in build
File "train.py", line 123 in get_next
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\legacy\trainer.py", line 58 in create_input_queue
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\legacy\trainer.py", line 279 in train
File "train.py", line 182 in main
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324 in new_func
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\absl\app.py", line 258 in _run_main
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\absl\app.py", line 312 in run
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\platform\app.py", line 40 in run
File "train.py", line 186 in

help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests