steps_per_epoch根据训练集的不同需要修改吗? #90

AnMoran · 2020-09-07T07:15:27Z

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

alexchungio · 2020-09-17T01:36:27Z

源码中使用的三个数据集总的样本数为396733，配置里step_per_peoch=500, gpus=4, batch_size=10，这样算每个epoch 的可训练的样本数=500 * 4 * 10 =20000，这样的话一个epoch是无法遍历整个数据集的，我这里也有困惑。

whereitogo · 2020-12-01T14:55:37Z

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

你好，请问，这个最终的训练结果怎么样？我像试一试作者提供的pb模型，但不知道怎么从docker取文件，可以发我一份吗？我这里训练太慢了，一个epoch要30分钟，不知道为啥！

xianzhe-741 · 2020-12-15T09:28:36Z

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition_5435.pb，在_ = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33]
2.在train.py时报错
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init
assert_type(model, ModelDescBase, 'model')
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type
name, tp.name, v.class.name)
AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

AnMoran · 2020-12-15T09:39:43Z

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition_5435.pb，在_ = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33]
2.在train.py时报错
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init
assert_type(model, ModelDescBase, 'model')
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type
name, tp.name, v.class.name)
AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

1.我没用过作者的docker，我是直接按照这个需求配置的本地虚拟环境，也没用过作者的模型
2.应该是版本的问题？

xianzhe-741 · 2020-12-15T09:44:02Z

font{ line-height: 1.6; } ul,ol{ padding-left: 20px; list-style-position: inside; } 好的谢谢您 389261056 [email protected] 签名由网易邮箱大师定制好的，谢谢您。下面这个问题看很多人都在问，我也遇见了，请问您是否有遇见。如果可以的话能加一下您的微信像您请教一下么（xianzhe741） Traceback (most recent call last): File "test.py", line 121, in <module> test(args) File "test.py", line 91, in test model = TextRecognition(args.pb_path, cfg.seq_len+1) File "test.py", line 23, in __init__ self.init_model() File "test.py", line 37, in init_model self.label_ph = self.sess.graph.get_tensor_by_name('label:0') File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked "graph." % (repr(name), repr(op_name))) KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph.” 在2020年12月15日 17:40，wang pengyuan<[email protected]> 写道：你好，我使用过程中有两个问题请教一下： test.py过程中使用作者docker中的模型text_recognition_5435.pb，在_ = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33] 2.在train.py时报错 File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found. 我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么? 1.我没用过作者的docker，我是直接按照这个需求配置的本地虚拟环境，也没用过作者的模型 2.应该是版本的问题？ —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steps_per_epoch根据训练集的不同需要修改吗? #90

steps_per_epoch根据训练集的不同需要修改吗? #90

AnMoran commented Sep 7, 2020

alexchungio commented Sep 17, 2020

whereitogo commented Dec 1, 2020 •

edited

Loading

xianzhe-741 commented Dec 15, 2020

AnMoran commented Dec 15, 2020

xianzhe-741 commented Dec 15, 2020 via email

steps_per_epoch根据训练集的不同需要修改吗? #90

steps_per_epoch根据训练集的不同需要修改吗? #90

Comments

AnMoran commented Sep 7, 2020

alexchungio commented Sep 17, 2020

whereitogo commented Dec 1, 2020 • edited Loading

xianzhe-741 commented Dec 15, 2020

AnMoran commented Dec 15, 2020

xianzhe-741 commented Dec 15, 2020 via email

whereitogo commented Dec 1, 2020 •

edited

Loading