Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

I could not train tensorflow googlenet in DIGITS. #2223

Open
edwardcho opened this issue Apr 17, 2020 · 0 comments
Open

I could not train tensorflow googlenet in DIGITS. #2223

edwardcho opened this issue Apr 17, 2020 · 0 comments

Comments

@edwardcho
Copy link

Hello Sir,

I tested caffe network and tensorflow network in DIGITS.
At first, I made dataset using CIFAR-10.
I saw that dataset generated.
Then, I started training caffe-googlenet. Normally training was started.
After finished caffe training, I started training tensorflow-googlenet.
OMG,....
I met this error.
image

2020-04-17 11:29:13.487565: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-17 11:29:13.832079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:84:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2020-04-17 11:29:13.832131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2020-04-17 11:29:14.386651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 11:29:14.386740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2020-04-17 11:29:14.386757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2020-04-17 11:29:14.387158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:84:00.0, compute capability: 7.5)
2020-04-17 11:29:17.181117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2020-04-17 11:29:17.181159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 11:29:17.181170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2020-04-17 11:29:17.181177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2020-04-17 11:29:17.181312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:84:00.0, compute capability: 7.5)
2020-04-17 11:29:17.460085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2020-04-17 11:29:17.460158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 11:29:17.460175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2020-04-17 11:29:17.460188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2020-04-17 11:29:17.460385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:84:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 743, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 566, in main
Validation(sess, val_model, 0)
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 378, in Validation
summary_str = sess.run(model.summary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [64,10] and labels shape [16]
[[Node: val/model/loss/cross_entropy_single/cross_entropy_single = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](val/model/Relu_57, val/data/batcher/_7)]]
[[Node: val/model/loss/cross_entropy_batch/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_484_val/model/loss/cross_entropy_batch", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op u'val/model/loss/cross_entropy_single/cross_entropy_single', defined at:
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 743, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/itsme/digits/digits/tools/tensorflow/main.py", line 507, in main
val_model.create_model(UserModel, stage_scope)  # noqa
File "/home/itsme/digits/digits/tools/tensorflow/model.py", line 167, in create_model
for loss in self.get_tower_losses(tower_model):
File "/home/itsme/digits/digits/tools/tensorflow/model.py", line 297, in get_tower_losses
if isinstance(tower.loss, list):
File "/home/itsme/digits/digits/tools/tensorflow/utils.py", line 37, in decorator
setattr(self, attribute, function(self))
File "<string>", line 105, in loss
File "/home/itsme/digits/digits/tools/tensorflow/utils.py", line 46, in classification_loss
ssoftmax = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=y, name='cross_entropy_single')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 2063, in sparse_softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 7519, in sparse_softmax_cross_entropy_with_logits
labels=labels, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [64,10] and labels shape [16]
[[Node: val/model/loss/cross_entropy_single/cross_entropy_single = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](val/model/Relu_57, val/data/batcher/_7)]]
[[Node: val/model/loss/cross_entropy_batch/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_484_val/model/loss/cross_entropy_batch", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant