Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问用0.001学习率会出现梯度爆炸没法学习,直接用默认的0.0002就不会? #86

Open
swjtulinxi opened this issue Nov 26, 2021 · 1 comment

Comments

@swjtulinxi
Copy link

No description provided.

@weizhiliang0520
Copy link

你好,我没有在下载代码之后,将图像文件放到该放的地方。
python train.py
就报错,但是从报错的情况来看,是跑完第一个epoch之后,在val 时候出错的。错误如下:

`[Network F] Total number of parameters : 16.892 M

Could not connect to Visdom server.
Trying to start a server....
Command: /usr/bin/python3 -m visdom.server -p 8097 &>/dev/null &
create web directory ./checkpoints/LEVIR-CDF0/web...
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
(epoch: 1, iters: 100, time: 0.085, data: 1.904) f: 45.299
(epoch: 1, iters: 200, time: 0.065, data: 0.283) f: 34.767
(epoch: 1, iters: 300, time: 0.086, data: 0.018) f: 148.914
(epoch: 1, iters: 400, time: 0.095, data: 0.680) f: 132.547
(epoch: 1) current_t_epoch: 76.626
saving the model at the end of epoch 1, iters 444

changedetection
['./LEVIR-CD/val/A/val_1.png', './LEVIR-CD/val/A/val_10.png', './LEVIR-CD/val/A/val_11.png', './LEVIR-CD/val/A/val_12.png', './LEVIR-CD/val/A/val_13.png', './LEVIR-CD/val/A/val_14.png', './LEVIR-CD/val/A/val_15.png', './LEVIR-CD/val/A/val_16.png', './LEVIR-CD/val/A/val_17.png', './LEVIR-CD/val/A/val_18.png', './LEVIR-CD/val/A/val_19.png', './LEVIR-CD/val/A/val_2.png', './LEVIR-CD/val/A/val_20.png', './LEVIR-CD/val/A/val_21.png', './LEVIR-CD/val/A/val_22.png', './LEVIR-CD/val/A/val_23.png', './LEVIR-CD/val/A/val_24.png', './LEVIR-CD/val/A/val_25.png', './LEVIR-CD/val/A/val_26.png', './LEVIR-CD/val/A/val_27.png', './LEVIR-CD/val/A/val_28.png', './LEVIR-CD/val/A/val_29.png', './LEVIR-CD/val/A/val_3.png', './LEVIR-CD/val/A/val_30.png', './LEVIR-CD/val/A/val_31.png', './LEVIR-CD/val/A/val_32.png', './LEVIR-CD/val/A/val_33.png', './LEVIR-CD/val/A/val_34.png', './LEVIR-CD/val/A/val_35.png', './LEVIR-CD/val/A/val_36.png', './LEVIR-CD/val/A/val_37.png', './LEVIR-CD/val/A/val_38.png', './LEVIR-CD/val/A/val_39.png', './LEVIR-CD/val/A/val_4.png', './LEVIR-CD/val/A/val_40.png', './LEVIR-CD/val/A/val_41.png', './LEVIR-CD/val/A/val_42.png', './LEVIR-CD/val/A/val_43.png', './LEVIR-CD/val/A/val_44.png', './LEVIR-CD/val/A/val_45.png', './LEVIR-CD/val/A/val_46.png', './LEVIR-CD/val/A/val_47.png', './LEVIR-CD/val/A/val_48.png', './LEVIR-CD/val/A/val_49.png', './LEVIR-CD/val/A/val_5.png', './LEVIR-CD/val/A/val_50.png', './LEVIR-CD/val/A/val_51.png', './LEVIR-CD/val/A/val_52.png', './LEVIR-CD/val/A/val_53.png', './LEVIR-CD/val/A/val_54.png', './LEVIR-CD/val/A/val_55.png', './LEVIR-CD/val/A/val_56.png', './LEVIR-CD/val/A/val_57.png', './LEVIR-CD/val/A/val_58.png', './LEVIR-CD/val/A/val_59.png', './LEVIR-CD/val/A/val_6.png', './LEVIR-CD/val/A/val_60.png', './LEVIR-CD/val/A/val_61.png', './LEVIR-CD/val/A/val_62.png', './LEVIR-CD/val/A/val_63.png', './LEVIR-CD/val/A/val_64.png', './LEVIR-CD/val/A/val_7.png', './LEVIR-CD/val/A/val_8.png', './LEVIR-CD/val/A/val_9.png']

dataset [ChangeDetectionDataset] was created

tcmalloc: large alloc 2147483648 bytes == 0x55839a3a0000 @ 0x7f010397db6b 0x7f010399d379 0x7f0003182cde 0x7f0003184452 0x7f00551abcf3 0x7f0055233158 0x7f005590a49f 0x7f00558ea870 0x7f00556d548a 0x7f005522429d 0x7f0055a03aca 0x7f00558e670e 0x7f00554abd17 0x7f0056e0e7c4 0x7f0056e0ec8d 0x7f00555216e8 0x7f00552223d5 0x7f0055a8a970 0x7f00557d7069 0x7f00fe5e32c3 0x7f00fe73d04e 0x55832f87f4b0 0x55832f970e1d 0x55832f8f2e99 0x55832f8ed9ee 0x55832f880bda 0x55832f8ef737 0x55832f880afa 0x55832f8ee915 0x55832f8ed9ee 0x55832f8ed6f3

Traceback (most recent call last):
File "train.py", line 169, in
miou_current = val(opt, model)
File "train.py", line 86, in val
score = model.test(val=True) # run inference
File "/content/drive/MyDrive/Experiment/codes/33_STANet/STANet/models/CDF0_model.py", line 79, in test
metrics.update(self.L.detach().cpu().numpy(), pred.detach().cpu().numpy())
File "/content/drive/MyDrive/Experiment/codes/33_STANet/STANet/util/metrics.py", line 121, in update
self.confusion_matrix += self.__fast_hist(lt.flatten(), lp.flatten())
File "/content/drive/MyDrive/Experiment/codes/33_STANet/STANet/util/metrics.py", line 108, in __fast_hist
hist = np.bincount(self.num_classes * label_gt[mask].astype(int) + label_pred[mask],

IndexError: boolean index did not match indexed array along dimension 0; dimension is 67108864 but corresponding boolean dimension is 1048576`

请问你有没有遇到这样的情况,或者说,你在python train.py 时候,需要改什么地方? 谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants