Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero #281

Open
ximik666 opened this issue Aug 9, 2019 · 8 comments

Comments

@ximik666
Copy link

ximik666 commented Aug 9, 2019

Hello. I am trying to train the model from the example about hololens, but such an error comes out during training. I download dataset from hololens and use this code

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
trainer.setTrainConfig(object_names_array=["hololens"], batch_size=1, num_experiments=20, train_from_pretrained_model="pretrained-yolov3.h5") #download pre-trained model via https://github.com/OlafenwaMoses/ImageAI/releases/download/essential-v4/pretrained-yolov3.h5
trainer.trainModel()`

pretrained-yolov3.h5 i download and put in example directory. What could be the problem?

Using TensorFlow backend.
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.88
Anchor Boxes generated.
Detection configuration saved in hololens/json/detection_config.json
Training on: ['hololens']
Training with Batch Size: 1
Number of Experiments: 20
WARNING:tensorflow:From /usr/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/lib/python3.7/site-packages/imageai-2.1.3-py3.7.egg/imageai/Detection/Custom/yolo.py:24: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Training with transfer learning from pretrained Model
/usr/lib/python3.7/site-packages/keras/callbacks.py:1065: UserWarning: epsilon argument is deprecated and will be removed, use min_delta instead.
warnings.warn('epsilon argument is deprecated and '
WARNING:tensorflow:From /usr/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/20
Traceback (most recent call last):
File "custom_detection_train.py", line 7, in
trainer.trainModel()
File "/usr/lib/python3.7/site-packages/imageai-2.1.3-py3.7.egg/imageai/Detection/Custom/init.py", line 286, in trainModel
max_queue_size=4
File "/usr/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/usr/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "/usr/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/usr/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in call
run_metadata_ptr)
File "/usr/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
[[{{node replica_0/model_1/yolo_layer_1/Reshape}}]]
[[{{node training/Adam/gradients/replica_0/model_1/bnorm_25/FusedBatchNorm_grad/FusedBatchNormGrad}}]]

@OlafenwaMoses
Copy link
Owner

I will review this. In the mean time

  • why are you using a batch size of 1 and not 2, 4,etc ?
  • what version of Tensorflow do you have installed?

@ximik666
Copy link
Author

i use tensorflow-gpu-1.13.2
I have only 2GB GPU memory and when i run script with bath size 2 and more i get tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,44,44,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

@ximik666
Copy link
Author

if i try this example everything ok
from imageai.Prediction.Custom import ModelTraining

model_trainer = ModelTraining()
model_trainer.setModelTypeAsResNet()
model_trainer.setDataDirectory("idenprof")
model_trainer.trainModel(num_objects=10, num_experiments=200, enhance_data=True, batch_size=2, show_network_summary=True)

@OlafenwaMoses
Copy link
Owner

I will advice that you use Google Colab for this training as it offers 15GB GPU memory to train. Object detection is a very compute intensive training and a batch size of 1 is not viable.

@ximik666
Copy link
Author

ximik666 commented Aug 11, 2019

OK, now i using Google Colab, hololens dataset, tensorflow-gpu 1.13 and get this error

from imageai.Detection.Custom import DetectionModelTrainer
trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
trainer.setTrainConfig(object_names_array=["hololens"], batch_size=2, num_experiments=200, train_from_pretrained_model="pretrained-yolov3.h5")
trainer.trainModel()

Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in hololens/json/detection_config.json
Training on: ['hololens']
Training with Batch Size: 2
Number of Experiments: 200
Training with transfer learning from pretrained Model

/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:1065: UserWarning: epsilon argument is deprecated and will be removed, use min_delta instead.
warnings.warn('epsilon argument is deprecated and '

Epoch 1/200


ResourceExhaustedError Traceback (most recent call last)

in ()
5 trainer.setDataDirectory(data_directory="hololens")
6 trainer.setTrainConfig(object_names_array=["hololens"], batch_size=2, num_experiments=200, train_from_pretrained_model="pretrained-yolov3.h5")
----> 7 trainer.trainModel()

8 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
526 None, None,
527 compat.as_text(c_api.TF_Message(self.status.status)),
--> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to

ResourceExhaustedError: OOM when allocating tensor with shape[1,416,416,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training_2/Adam/gradients/zeros_573-0-1-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[{{node training_2/Adam/gradients/replica_1_2/model_7/bnorm_38/cond/FusedBatchNorm_grad/FusedBatchNormGrad}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Screenshot-20190811-122208

Whats wrong?

@Meulen92
Copy link

Seems like you ran out of video memory to support a batch_size of 2 on this specific dataset. How much GPU memory do you have available?

Unfortunately, a batch_size of 1 will never work.

@Yejing-Lai
Copy link

hello,I have encountered the similar error。
image
I changed the batch size to 2 or 4 are the same mistakes. Have you solved this problem?

@rashminair1986
Copy link

I don't understand how to resolve this error , even after changing the batch size to 2.
It works well when the size =1 but the loss computation is wrong.

Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.87
Anchor Boxes generated.
Detection configuration saved in fishes1_2\json\detection_config.json
Training on: ['Dascyllus', 'Myripristis', 'Plectroglyphidodon']
Training with Batch Size: 2
Number of Experiments: 100
WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\imageai\Detection\Custom\yolo.py:24: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\imageai\Detection\Custom\yolo.py:149: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Training with transfer learning from pretrained Model
D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\callbacks\callbacks.py:998: UserWarning: epsilon argument is deprecated and will be removed, use min_delta instead.
warnings.warn('epsilon argument is deprecated and '
WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py:431: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py:438: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\callbacks\tensorboard_v1.py:200: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\callbacks\tensorboard_v1.py:203: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Epoch 1/100
1/2592 [..............................] - ETA: 25:52:34 - loss: 128.3459 - yolo_layer_1_loss: 19.1331 - yolo_layer_2_ 2/2592 [..............................] - ETA: 13:25:01 - loss: 129.0509 - yolo_layer_1_loss: 19.6597 - yolo_layer_2_ 3/2592 [..............................] - ETA: 9:12:23 - loss: 128.5705 - yolo_layer_1_loss: 19.6130 - yolo_layer_2_l 4/2592 [..............................] - ETA: 7:07:19 - loss: 127.7652 - yolo_layer_1_loss: 19.5261 - yolo_layer_2_loss: 37.1629 - yolo_layer_3_loss: 71.0762Traceback (most recent call last):
File "fish_train.py", line 7, in
trainer.trainModel()
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\imageai\Detection\Custom_init_.py", line 291, in trainModel
max_queue_size=8
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\engine\training_generator.py", line 220, in fit_generator
reset_metrics=False)
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1514, in train_on_batch
outputs = self.train_function(ins)
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\keras\backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "D:\deeplearningvideos\anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1024,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/replica_1/model_1/conv_80/convolution_grad/Conv2DBackpropInput}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[training/Adam/gradients/replica_0/model_1/bnorm_32/cond/FusedBatchNorm_grad/FusedBatchNormGrad/_5957]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1024,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/replica_1/model_1/conv_80/convolution_grad/Conv2DBackpropInput}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants