error First step cannot be zero when running train.py #51

blockhunts · 2018-05-28T02:32:01Z

i tried to use the same images (card) provided, i just delete all the processed file (csv,dll) and follow all the step.
And when i tried to issue python train.py
I got this error

Traceback (most recent call last):
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\MRCPP-Fablab\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "E:\tensor\models\research\object_detection\trainer.py", line 288, in train
    train_config.optimizer)
  File "E:\tensor\models\research\object_detection\builders\optimizer_builder.py", line 50, in build
    learning_rate = _create_learning_rate(config.learning_rate)
  File "E:\tensor\models\research\object_detection\builders\optimizer_builder.py", line 109, in _create_learning_rate
    learning_rate_sequence, config.warmup)
  File "E:\tensor\models\research\object_detection\utils\learning_schedules.py", line 156, in manual_stepping
    raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.

Any clues why this happen?

The text was updated successfully, but these errors were encountered:

Surasi-Jui · 2018-06-03T11:21:41Z

I have the same error.Do you find how to solve it?

blockhunts · 2018-06-04T09:52:07Z

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }

Surasi-Jui · 2018-06-04T10:15:10Z

Thank you. It work here :)

…

On Mon, 4 Jun 2561 at 16:52 blockhunts ***@***.***> wrote: yes, edit this in your config file in ...\models\research\object_detection\training train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0002 schedule { step: 900000 learning_rate: .00002 } schedule { step: 1200000 learning_rate: .000002 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#51 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AmB6BPsiQVe1w3fV6oSj5jUW-ATlE7pEks5t5QNIgaJpZM4UPjDR> .

leccyril · 2018-06-28T07:15:04Z

if you download model from the github repository files are up to date

jim-meyer · 2018-06-30T20:15:48Z

I ran into this same error while using the AWS DL AMI (Deep Learning AMI (Ubuntu) Version 10.0 (ami-23c4fb46)) and following, as far as I can tell, the same steps I used on Windows with obvious substitutions since this AMI is Ubuntu. Both Ubuntu and Windows are using TF 1.8. But when I use the train_config that blockhunts mentioned I get:
Traceback (most recent call last):
File "/ml/models/research/object_detection/train.py", line 184, in
tf.app.run()
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/ml/models/research/object_detection/train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/ml/models/research/object_detection/trainer.py", line 298, in train
train_config.optimizer)
File "/ml/models/research/object_detection/builders/optimizer_builder.py", line 50, in build
learning_rate = _create_learning_rate(config.learning_rate)
File "/ml/models/research/object_detection/builders/optimizer_builder.py", line 109, in _create_learning_rate
learning_rate_sequence, config.warmup)
File "/ml/models/research/object_detection/utils/learning_schedules.py", line 169, in manual_stepping
[0] * num_boundaries))
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2681, in where
return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6699, in select
"Select", condition=condition, t=x, e=y, name=name)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 528, in _apply_op_helper
(input_name, err))
ValueError: Tried to convert 't' to a tensor and failed. Error: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].

Any ideas?

jim-meyer · 2018-06-30T20:29:24Z

I see that epratheeban has the solution to my problem mentioned here #11:

It's easy. Go to the utils folder. Find the learning_schedules.py file. Go to the line 167. And replace the line 167 with below

rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),
list(range(num_boundaries)),
[0] * num_boundaries))

aghapesar1374 · 2018-07-10T20:41:18Z

Hi @jim-meyer
I make this change and the problem solved but now returned this error

WARNING:tensorflow:From C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-pack
ages\object_detection-0.1-py3.5.egg\object_detection\core\losses.py:317: softmax
_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and
will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow
python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\trainer.py", line 288, in train
train_config.optimizer)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\builders\optimizer_builder.py", line 50, in
build
learning_rate = _create_learning_rate(config.learning_rate)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\builders\optimizer_builder.py", line 109, i
n _create_learning_rate
learning_rate_sequence, config.warmup)
File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete
ction-0.1-py3.5.egg\object_detection\utils\learning_schedules.py", line 168, in
manual_stepping
list(num_boundaries),
TypeError: 'int' object is not iterable

tamizharasank · 2018-07-20T11:12:01Z

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

leccyril · 2018-07-20T11:25:03Z

@tamizharasank what file ? this kind of error copy it in google you will find the fix easily

Adibhatt95 · 2018-07-23T16:43:10Z

@tamizharasank did you solve this error? I got the same error, any suggesstions?

Kkaranmore · 2019-02-09T05:27:54Z

After making changes in configure file in training folder I got this error:

(tensorflow1) C:\tensorflow1\models\research\object_detection>python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\predictors\heads\box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\core\losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\ops\gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2236: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\util\deprecation.py", line 272, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\trainer.py", line 397, in train
include_global_step=False))
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\variables_helper.py", line 126, in get_variables_available_in_checkpoint
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 306, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt: Not found: FindFirstFile failed for: C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28 : The system cannot find the path specified.
; No such process

jim-meyer · 2019-02-10T14:04:05Z

Looks like you probably did not follow all of the steps in 2a, "Download TensorFlow Object Detection API repository from GitHub" and/or 2b, "Download the Faster-RCNN-Inception-V2-COCO model from TensorFlow's model zoo". Try following those steps again exactly and that should fix your problem.

mohamedelsiesyibra · 2019-03-15T09:34:59Z

File "C:\tensorflow1\models\research\object_detection\utils\learning_schedules.py", line 160, in manual_stepping
raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.

i edit the file and save it and when i train it again it's return to it's original value

bebop-boop · 2019-06-01T07:00:11Z

I'm getting below error while i was trying to run:
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
WARNING:tensorflow:From C:\Tensorflow\models\research\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:From C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.

Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Tensorflow\models\research\object_detection\legacy\trainer.py", line 280, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "C:\Tensorflow\models\research\object_detection\legacy\trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "train.py", line 121, in get_next
dataset_builder.build(config)).get_next()
File "C:\Tensorflow\models\research\object_detection\builders\dataset_builder.py", line 124, in build
num_additional_channels=input_reader_config.num_additional_channels)
File "C:\Tensorflow\models\research\object_detection\data_decoders\tf_example_decoder.py", line 307, in init
default_value=''),
File "C:\Tensorflow\models\research\object_detection\data_decoders\tf_example_decoder.py", line 59, in init
label_map_proto_file, use_display_name=False)
File "C:\Tensorflow\models\research\object_detection\utils\label_map_util.py", line 164, in get_label_map_dict
label_map = load_labelmap(label_map_path)
File "C:\Tensorflow\models\research\object_detection\utils\label_map_util.py", line 133, in load_labelmap
label_map_string = fid.read()
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 125, in read
self._preread_check()
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 85, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: NewRandomAccessFile failed to Create/Open: C:\Tensorflow\workspace raining_demonnotations/label_map.pbtxt : The filename, directory name, or volume label syntax is incorrect.
; Unknown error

jim-meyer · 2019-06-01T20:10:03Z

@ShubhranshuMaurya that error seems to indicate that there is something wrong with C:\Tensorflow\workspace raining_demonnotations/label_map.pbtxt. Have you opened that file in a text editor to see if it looks right? That file file should look something like this:
item {
name: 'Class1'
id: 1
display_name: 'Class1 Label Name'
}

item {
name: 'Class2'
id: 2
display_name: 'Class2 Label Name'
}

IIRC this file could also be a binary protobuf file in which case viewing it in a text editor won't tell you much. But if it appears to be binary perhaps you could try creating a text version with your training labels and see if that works.

bharath5673 · 2019-07-06T14:59:36Z

#tessor flow custom training

ERROR:raise ValueError('First step cannot be zero.') ValueError: First step cannot be zero.

SOLUTION: object_detection\training\ .config

train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}

Arri · 2019-08-30T04:14:33Z

For me it worked with 'step: 1'
for some reason there was 'step: 0'...

siddas27 · 2019-12-24T08:51:16Z

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

Did you find a solution?

dpbnasika · 2020-06-23T23:01:33Z

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }

can you explain what is happening in learning rate?, what does the both step size signify in manual learning rate and also what is initial learning rate?

EMRYLMZ1 · 2022-04-11T01:04:30Z

python train.py --logtostderr -train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v2_quantized_300x300_coco.config

Current thread 0x00005734 (most recent call first):
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 84 in _preread_check
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 122 in read
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\label_map_util.py", line 168 in load_labelmap
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\label_map_util.py", line 201 in get_label_map_dict
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 93 in init
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 460 in init
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\decoder_builder.py", line 63 in build
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py", line 209 in build
File "train.py", line 123 in get_next
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\legacy\trainer.py", line 58 in create_input_queue
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\legacy\trainer.py", line 279 in train
File "train.py", line 182 in main
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324 in new_func
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\absl\app.py", line 258 in _run_main
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\absl\app.py", line 312 in run
File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\platform\app.py", line 40 in run
File "train.py", line 186 in

help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error First step cannot be zero when running train.py #51

error First step cannot be zero when running train.py #51

blockhunts commented May 28, 2018

Surasi-Jui commented Jun 3, 2018

blockhunts commented Jun 4, 2018

Surasi-Jui commented Jun 4, 2018 via email

leccyril commented Jun 28, 2018

jim-meyer commented Jun 30, 2018

jim-meyer commented Jun 30, 2018

aghapesar1374 commented Jul 10, 2018

tamizharasank commented Jul 20, 2018

leccyril commented Jul 20, 2018

Adibhatt95 commented Jul 23, 2018

Kkaranmore commented Feb 9, 2019

jim-meyer commented Feb 10, 2019

mohamedelsiesyibra commented Mar 15, 2019

bebop-boop commented Jun 1, 2019 •

edited

Loading

jim-meyer commented Jun 1, 2019

bharath5673 commented Jul 6, 2019

Arri commented Aug 30, 2019

siddas27 commented Dec 24, 2019

dpbnasika commented Jun 23, 2020

EMRYLMZ1 commented Apr 11, 2022

error First step cannot be zero when running train.py #51

error First step cannot be zero when running train.py #51

Comments

blockhunts commented May 28, 2018

Surasi-Jui commented Jun 3, 2018

blockhunts commented Jun 4, 2018

Surasi-Jui commented Jun 4, 2018 via email

leccyril commented Jun 28, 2018

jim-meyer commented Jun 30, 2018

jim-meyer commented Jun 30, 2018

aghapesar1374 commented Jul 10, 2018

tamizharasank commented Jul 20, 2018

leccyril commented Jul 20, 2018

Adibhatt95 commented Jul 23, 2018

Kkaranmore commented Feb 9, 2019

jim-meyer commented Feb 10, 2019

mohamedelsiesyibra commented Mar 15, 2019

bebop-boop commented Jun 1, 2019 • edited Loading

jim-meyer commented Jun 1, 2019

bharath5673 commented Jul 6, 2019

Arri commented Aug 30, 2019

siddas27 commented Dec 24, 2019

dpbnasika commented Jun 23, 2020

EMRYLMZ1 commented Apr 11, 2022

bebop-boop commented Jun 1, 2019 •

edited

Loading