-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsweeplog
1508 lines (1431 loc) · 245 KB
/
sweeplog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
wandb: Agent Starting Run: rxx2sygy with config:
wandb: accumulate_grad_batches: 7
wandb: epochs: 1
wandb: gradient_clip_val: 0.92372930002197
wandb: init_lora_weights: pissa_niter_32
wandb: lora_alpha: 32
wandb: lora_dropout: 0.0794528890442868
wandb: lora_rank: 16
wandb: lr: 0.0004051653419029888
wandb: model_name: tiiuae/falcon-7b-instruct
wandb: Currently logged in as: j0ntendo (j0ntendo-yonsei-university). Use `wandb login --relogin` to force relogin
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_134906-rxx2sygy
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run usual-sweep-1
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/rxx2sygy
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Repo card metadata block was not found. Setting CardData to empty.
[0;35m
####
###########
####################
############################
#####################################
##############################################
######################### ###################
####################### ###################
#################### ####################
################## #####################
################ ######################
##################### #################
###################### ###################
##################### #####################
#################### #######################
################### #########################
##############################################
#####################################
############################
####################
##########
####
[0m
Create sweep with ID: 4ik6ex9z
Sweep URL: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
[2024-08-06 13:49:20,008] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/user/jonathan/jonathan/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/user/jonathan/jonathan/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
/user/jonathan/jonathan/lib/python3.10/site-packages/lightning/pytorch/loggers/wandb.py:396: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
2024-08-06 13:49:24.254273: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-06 13:49:27.865117: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/user/jonathan/jonathan/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:652: Checkpoint directory /user/jonathan/checkpoint exists and is not empty.
Enabling DeepSpeed BF16. Model parameters and inputs will be cast to `bfloat16`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Warning: The default cache directory for DeepSpeed Triton autotune, /user/jonathan/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[93m [WARNING] [0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible
[2024-08-06 13:49:37,199] [WARNING] [engine.py:1179:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
| Name | Type | Params | Mode
-------------------------------------------------------
0 | model | PeftModelForCausalLM | 1.5 B | train
-------------------------------------------------------
2.2 M Trainable params
1.5 B Non-trainable params
1.5 B Total params
6,192.290 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]/user/jonathan/jonathan/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=127` in the `DataLoader` to improve performance.
Sanity Checking: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 2.01it/s] /user/jonathan/jonathan/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=127` in the `DataLoader` to improve performance.
Training: | | 0/? [00:00<?, ?it/s]Training: 0%| | 0/192 [00:00<?, ?it/s]Epoch 0: 0%| | 0/192 [00:00<?, ?it/s] Epoch 0: 3%|▎ | 5/192 [00:01<01:01, 3.02it/s]Epoch 0: 3%|▎ | 5/192 [00:01<01:01, 3.02it/s, v_num=gy_1, train_loss_step=5.450, train_rouge1_fmeasure_step=0.00117, train_rouge1_precision_step=0.000589, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00117, train_rougeL_precision_step=0.000589, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.00117, train_rougeLsum_precision_step=0.000589, train_rougeLsum_recall_step=0.100]Epoch 0: 5%|▌ | 10/192 [00:03<01:01, 2.98it/s, v_num=gy_1, train_loss_step=5.450, train_rouge1_fmeasure_step=0.00117, train_rouge1_precision_step=0.000589, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00117, train_rougeL_precision_step=0.000589, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.00117, train_rougeLsum_precision_step=0.000589, train_rougeLsum_recall_step=0.100]Epoch 0: 5%|▌ | 10/192 [00:03<01:01, 2.98it/s, v_num=gy_1, train_loss_step=5.730, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 8%|▊ | 15/192 [00:04<00:58, 3.02it/s, v_num=gy_1, train_loss_step=5.730, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 8%|▊ | 15/192 [00:04<00:58, 3.02it/s, v_num=gy_1, train_loss_step=4.770, train_rouge1_fmeasure_step=0.00123, train_rouge1_precision_step=0.000627, train_rouge1_recall_step=0.0385, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00123, train_rougeL_precision_step=0.000627, train_rougeL_recall_step=0.0385, train_rougeLsum_fmeasure_step=0.00123, train_rougeLsum_precision_step=0.000627, train_rougeLsum_recall_step=0.0385]Epoch 0: 10%|█ | 20/192 [00:06<00:57, 3.02it/s, v_num=gy_1, train_loss_step=4.770, train_rouge1_fmeasure_step=0.00123, train_rouge1_precision_step=0.000627, train_rouge1_recall_step=0.0385, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00123, train_rougeL_precision_step=0.000627, train_rougeL_recall_step=0.0385, train_rougeLsum_fmeasure_step=0.00123, train_rougeLsum_precision_step=0.000627, train_rougeLsum_recall_step=0.0385]Epoch 0: 10%|█ | 20/192 [00:06<00:57, 3.02it/s, v_num=gy_1, train_loss_step=4.780, train_rouge1_fmeasure_step=0.00145, train_rouge1_precision_step=0.000733, train_rouge1_recall_step=0.0714, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00145, train_rougeL_precision_step=0.000733, train_rougeL_recall_step=0.0714, train_rougeLsum_fmeasure_step=0.00145, train_rougeLsum_precision_step=0.000733, train_rougeLsum_recall_step=0.0714]Epoch 0: 13%|█▎ | 25/192 [00:08<00:55, 3.03it/s, v_num=gy_1, train_loss_step=4.780, train_rouge1_fmeasure_step=0.00145, train_rouge1_precision_step=0.000733, train_rouge1_recall_step=0.0714, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00145, train_rougeL_precision_step=0.000733, train_rougeL_recall_step=0.0714, train_rougeLsum_fmeasure_step=0.00145, train_rougeLsum_precision_step=0.000733, train_rougeLsum_recall_step=0.0714]Epoch 0: 13%|█▎ | 25/192 [00:08<00:55, 3.03it/s, v_num=gy_1, train_loss_step=3.640, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 16%|█▌ | 30/192 [00:09<00:52, 3.06it/s, v_num=gy_1, train_loss_step=3.640, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 16%|█▌ | 30/192 [00:09<00:52, 3.06it/s, v_num=gy_1, train_loss_step=2.730, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:11<00:51, 3.08it/s, v_num=gy_1, train_loss_step=2.730, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:11<00:51, 3.08it/s, v_num=gy_1, train_loss_step=3.600, train_rouge1_fmeasure_step=0.0455, train_rouge1_precision_step=0.0446, train_rouge1_recall_step=0.0464, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0455, train_rougeL_precision_step=0.0446, train_rougeL_recall_step=0.0464, train_rougeLsum_fmeasure_step=0.0455, train_rougeLsum_precision_step=0.0446, train_rougeLsum_recall_step=0.0464]Epoch 0: 21%|██ | 40/192 [00:12<00:49, 3.09it/s, v_num=gy_1, train_loss_step=3.600, train_rouge1_fmeasure_step=0.0455, train_rouge1_precision_step=0.0446, train_rouge1_recall_step=0.0464, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0455, train_rougeL_precision_step=0.0446, train_rougeL_recall_step=0.0464, train_rougeLsum_fmeasure_step=0.0455, train_rougeLsum_precision_step=0.0446, train_rougeLsum_recall_step=0.0464]Epoch 0: 21%|██ | 40/192 [00:12<00:49, 3.09it/s, v_num=gy_1, train_loss_step=3.160, train_rouge1_fmeasure_step=0.0432, train_rouge1_precision_step=0.0323, train_rouge1_recall_step=0.0652, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0288, train_rougeL_precision_step=0.0215, train_rougeL_recall_step=0.0435, train_rougeLsum_fmeasure_step=0.0432, train_rougeLsum_precision_step=0.0323, train_rougeLsum_recall_step=0.0652]Epoch 0: 23%|██▎ | 45/192 [00:14<00:47, 3.10it/s, v_num=gy_1, train_loss_step=3.160, train_rouge1_fmeasure_step=0.0432, train_rouge1_precision_step=0.0323, train_rouge1_recall_step=0.0652, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0288, train_rougeL_precision_step=0.0215, train_rougeL_recall_step=0.0435, train_rougeLsum_fmeasure_step=0.0432, train_rougeLsum_precision_step=0.0323, train_rougeLsum_recall_step=0.0652]Epoch 0: 23%|██▎ | 45/192 [00:14<00:47, 3.10it/s, v_num=gy_1, train_loss_step=1.630, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 26%|██▌ | 50/192 [00:16<00:45, 3.11it/s, v_num=gy_1, train_loss_step=1.630, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 26%|██▌ | 50/192 [00:16<00:45, 3.11it/s, v_num=gy_1, train_loss_step=2.280, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:17<00:44, 3.11it/s, v_num=gy_1, train_loss_step=2.280, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:17<00:44, 3.11it/s, v_num=gy_1, train_loss_step=2.930, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:19<00:42, 3.12it/s, v_num=gy_1, train_loss_step=2.930, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:19<00:42, 3.12it/s, v_num=gy_1, train_loss_step=2.600, train_rouge1_fmeasure_step=0.0259, train_rouge1_precision_step=0.014, train_rouge1_recall_step=0.174, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0194, train_rougeL_precision_step=0.0105, train_rougeL_recall_step=0.130, train_rougeLsum_fmeasure_step=0.0259, train_rougeLsum_precision_step=0.014, train_rougeLsum_recall_step=0.174]Epoch 0: 34%|███▍ | 65/192 [00:20<00:40, 3.12it/s, v_num=gy_1, train_loss_step=2.600, train_rouge1_fmeasure_step=0.0259, train_rouge1_precision_step=0.014, train_rouge1_recall_step=0.174, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0194, train_rougeL_precision_step=0.0105, train_rougeL_recall_step=0.130, train_rougeLsum_fmeasure_step=0.0259, train_rougeLsum_precision_step=0.014, train_rougeLsum_recall_step=0.174]Epoch 0: 34%|███▍ | 65/192 [00:20<00:40, 3.12it/s, v_num=gy_1, train_loss_step=2.680, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 36%|███▋ | 70/192 [00:22<00:39, 3.13it/s, v_num=gy_1, train_loss_step=2.680, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 36%|███▋ | 70/192 [00:22<00:39, 3.13it/s, v_num=gy_1, train_loss_step=2.780, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 39%|███▉ | 75/192 [00:23<00:37, 3.13it/s, v_num=gy_1, train_loss_step=2.780, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 39%|███▉ | 75/192 [00:23<00:37, 3.13it/s, v_num=gy_1, train_loss_step=5.300, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 42%|████▏ | 80/192 [00:25<00:35, 3.13it/s, v_num=gy_1, train_loss_step=5.300, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 42%|████▏ | 80/192 [00:25<00:35, 3.13it/s, v_num=gy_1, train_loss_step=1.990, train_rouge1_fmeasure_step=0.0674, train_rouge1_precision_step=0.0938, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0562, train_rougeL_precision_step=0.0781, train_rougeL_recall_step=0.0439, train_rougeLsum_fmeasure_step=0.0562, train_rougeLsum_precision_step=0.0781, train_rougeLsum_recall_step=0.0439]Epoch 0: 44%|████▍ | 85/192 [00:27<00:34, 3.14it/s, v_num=gy_1, train_loss_step=1.990, train_rouge1_fmeasure_step=0.0674, train_rouge1_precision_step=0.0938, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0562, train_rougeL_precision_step=0.0781, train_rougeL_recall_step=0.0439, train_rougeLsum_fmeasure_step=0.0562, train_rougeLsum_precision_step=0.0781, train_rougeLsum_recall_step=0.0439]Epoch 0: 44%|████▍ | 85/192 [00:27<00:34, 3.14it/s, v_num=gy_1, train_loss_step=0.776, train_rouge1_fmeasure_step=0.0154, train_rouge1_precision_step=0.0106, train_rouge1_recall_step=0.0278, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0154, train_rougeL_precision_step=0.0106, train_rougeL_recall_step=0.0278, train_rougeLsum_fmeasure_step=0.0154, train_rougeLsum_precision_step=0.0106, train_rougeLsum_recall_step=0.0278]Epoch 0: 47%|████▋ | 90/192 [00:28<00:32, 3.14it/s, v_num=gy_1, train_loss_step=0.776, train_rouge1_fmeasure_step=0.0154, train_rouge1_precision_step=0.0106, train_rouge1_recall_step=0.0278, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0154, train_rougeL_precision_step=0.0106, train_rougeL_recall_step=0.0278, train_rougeLsum_fmeasure_step=0.0154, train_rougeLsum_precision_step=0.0106, train_rougeLsum_recall_step=0.0278]Epoch 0: 47%|████▋ | 90/192 [00:28<00:32, 3.14it/s, v_num=gy_1, train_loss_step=5.700, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 49%|████▉ | 95/192 [00:30<00:30, 3.15it/s, v_num=gy_1, train_loss_step=5.700, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 49%|████▉ | 95/192 [00:30<00:30, 3.15it/s, v_num=gy_1, train_loss_step=0.772, train_rouge1_fmeasure_step=0.0755, train_rouge1_precision_step=0.0444, train_rouge1_recall_step=0.250, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0755, train_rougeL_precision_step=0.0444, train_rougeL_recall_step=0.250, train_rougeLsum_fmeasure_step=0.0755, train_rougeLsum_precision_step=0.0444, train_rougeLsum_recall_step=0.250]Epoch 0: 52%|█████▏ | 100/192 [00:31<00:29, 3.15it/s, v_num=gy_1, train_loss_step=0.772, train_rouge1_fmeasure_step=0.0755, train_rouge1_precision_step=0.0444, train_rouge1_recall_step=0.250, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0755, train_rougeL_precision_step=0.0444, train_rougeL_recall_step=0.250, train_rougeLsum_fmeasure_step=0.0755, train_rougeLsum_precision_step=0.0444, train_rougeLsum_recall_step=0.250]Epoch 0: 52%|█████▏ | 100/192 [00:31<00:29, 3.15it/s, v_num=gy_1, train_loss_step=1.070, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 55%|█████▍ | 105/192 [00:33<00:27, 3.15it/s, v_num=gy_1, train_loss_step=1.070, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:33<00:27, 3.15it/s, v_num=gy_1, train_loss_step=0.380, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:34<00:26, 3.15it/s, v_num=gy_1, train_loss_step=0.380, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:34<00:26, 3.15it/s, v_num=gy_1, train_loss_step=0.360, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 60%|█████▉ | 115/192 [00:36<00:24, 3.15it/s, v_num=gy_1, train_loss_step=0.360, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 60%|█████▉ | 115/192 [00:36<00:24, 3.15it/s, v_num=gy_1, train_loss_step=5.230, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 62%|██████▎ | 120/192 [00:38<00:22, 3.16it/s, v_num=gy_1, train_loss_step=5.230, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 62%|██████▎ | 120/192 [00:38<00:22, 3.16it/s, v_num=gy_1, train_loss_step=0.338, train_rouge1_fmeasure_step=0.0571, train_rouge1_precision_step=0.0625, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0571, train_rougeL_precision_step=0.0625, train_rougeL_recall_step=0.0526, train_rougeLsum_fmeasure_step=0.0571, train_rougeLsum_precision_step=0.0625, train_rougeLsum_recall_step=0.0526]Epoch 0: 65%|██████▌ | 125/192 [00:39<00:21, 3.16it/s, v_num=gy_1, train_loss_step=0.338, train_rouge1_fmeasure_step=0.0571, train_rouge1_precision_step=0.0625, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0571, train_rougeL_precision_step=0.0625, train_rougeL_recall_step=0.0526, train_rougeLsum_fmeasure_step=0.0571, train_rougeLsum_precision_step=0.0625, train_rougeLsum_recall_step=0.0526]Epoch 0: 65%|██████▌ | 125/192 [00:39<00:21, 3.16it/s, v_num=gy_1, train_loss_step=0.180, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 68%|██████▊ | 130/192 [00:41<00:19, 3.16it/s, v_num=gy_1, train_loss_step=0.180, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 68%|██████▊ | 130/192 [00:41<00:19, 3.16it/s, v_num=gy_1, train_loss_step=1.770, train_rouge1_fmeasure_step=0.0707, train_rouge1_precision_step=0.152, train_rouge1_recall_step=0.0461, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0505, train_rougeL_precision_step=0.109, train_rougeL_recall_step=0.0329, train_rougeLsum_fmeasure_step=0.0505, train_rougeLsum_precision_step=0.109, train_rougeLsum_recall_step=0.0329]Epoch 0: 70%|███████ | 135/192 [00:42<00:18, 3.16it/s, v_num=gy_1, train_loss_step=1.770, train_rouge1_fmeasure_step=0.0707, train_rouge1_precision_step=0.152, train_rouge1_recall_step=0.0461, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0505, train_rougeL_precision_step=0.109, train_rougeL_recall_step=0.0329, train_rougeLsum_fmeasure_step=0.0505, train_rougeLsum_precision_step=0.109, train_rougeLsum_recall_step=0.0329]Epoch 0: 70%|███████ | 135/192 [00:42<00:18, 3.16it/s, v_num=gy_1, train_loss_step=1.910, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 73%|███████▎ | 140/192 [00:44<00:16, 3.16it/s, v_num=gy_1, train_loss_step=1.910, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 73%|███████▎ | 140/192 [00:44<00:16, 3.16it/s, v_num=gy_1, train_loss_step=3.720, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 76%|███████▌ | 145/192 [00:45<00:14, 3.16it/s, v_num=gy_1, train_loss_step=3.720, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 76%|███████▌ | 145/192 [00:45<00:14, 3.16it/s, v_num=gy_1, train_loss_step=0.154, train_rouge1_fmeasure_step=0.0833, train_rouge1_precision_step=0.0667, train_rouge1_recall_step=0.111, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0833, train_rougeL_precision_step=0.0667, train_rougeL_recall_step=0.111, train_rougeLsum_fmeasure_step=0.0833, train_rougeLsum_precision_step=0.0667, train_rougeLsum_recall_step=0.111]Epoch 0: 78%|███████▊ | 150/192 [00:47<00:13, 3.16it/s, v_num=gy_1, train_loss_step=0.154, train_rouge1_fmeasure_step=0.0833, train_rouge1_precision_step=0.0667, train_rouge1_recall_step=0.111, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0833, train_rougeL_precision_step=0.0667, train_rougeL_recall_step=0.111, train_rougeLsum_fmeasure_step=0.0833, train_rougeLsum_precision_step=0.0667, train_rougeLsum_recall_step=0.111]Epoch 0: 78%|███████▊ | 150/192 [00:47<00:13, 3.16it/s, v_num=gy_1, train_loss_step=2.160, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 81%|████████ | 155/192 [00:48<00:11, 3.17it/s, v_num=gy_1, train_loss_step=2.160, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 81%|████████ | 155/192 [00:48<00:11, 3.17it/s, v_num=gy_1, train_loss_step=1.150, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 83%|████████▎ | 160/192 [00:50<00:10, 3.17it/s, v_num=gy_1, train_loss_step=1.150, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 83%|████████▎ | 160/192 [00:50<00:10, 3.17it/s, v_num=gy_1, train_loss_step=0.596, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 86%|████████▌ | 165/192 [00:52<00:08, 3.17it/s, v_num=gy_1, train_loss_step=0.596, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 86%|████████▌ | 165/192 [00:52<00:08, 3.17it/s, v_num=gy_1, train_loss_step=2.700, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 89%|████████▊ | 170/192 [00:53<00:06, 3.17it/s, v_num=gy_1, train_loss_step=2.700, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 89%|████████▊ | 170/192 [00:53<00:06, 3.17it/s, v_num=gy_1, train_loss_step=0.307, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 91%|█████████ | 175/192 [00:55<00:05, 3.17it/s, v_num=gy_1, train_loss_step=0.307, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 91%|█████████ | 175/192 [00:55<00:05, 3.17it/s, v_num=gy_1, train_loss_step=0.288, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 94%|█████████▍| 180/192 [00:56<00:03, 3.17it/s, v_num=gy_1, train_loss_step=0.288, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 94%|█████████▍| 180/192 [00:56<00:03, 3.17it/s, v_num=gy_1, train_loss_step=0.577, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 96%|█████████▋| 185/192 [00:58<00:02, 3.17it/s, v_num=gy_1, train_loss_step=0.577, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 96%|█████████▋| 185/192 [00:58<00:02, 3.17it/s, v_num=gy_1, train_loss_step=6.880, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 99%|█████████▉| 190/192 [00:59<00:00, 3.18it/s, v_num=gy_1, train_loss_step=6.880, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 99%|█████████▉| 190/192 [00:59<00:00, 3.17it/s, v_num=gy_1, train_loss_step=0.681, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 100%|██████████| 192/192 [01:00<00:00, 3.18it/s, v_num=gy_1, train_loss_step=0.681, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 100%|██████████| 192/192 [01:00<00:00, 3.18it/s, v_num=gy_1, train_loss_step=0.763, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]
Validation: | | 0/? [00:00<?, ?it/s][A
Validation: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 42%|████▏ | 5/12 [00:00<00:01, 5.39it/s][A
Validation DataLoader 0: 83%|████████▎ | 10/12 [00:01<00:00, 5.37it/s][A
Validation DataLoader 0: 100%|██████████| 12/12 [00:02<00:00, 5.36it/s][A
[AEpoch 0: 100%|██████████| 192/192 [01:02<00:00, 3.06it/s, v_num=gy_1, train_loss_step=0.763, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000, val_loss_step=6.070, val_rouge1_fmeasure_step=0.000, val_rouge1_precision_step=0.000, val_rouge1_recall_step=0.000, val_rouge2_fmeasure_step=0.000, val_rouge2_precision_step=0.000, val_rouge2_recall_step=0.000, val_rougeL_fmeasure_step=0.000, val_rougeL_precision_step=0.000, val_rougeL_recall_step=0.000, val_rougeLsum_fmeasure_step=0.000, val_rougeLsum_precision_step=0.000, val_rougeLsum_recall_step=0.000, val_loss_epoch=1.350, val_rouge1_fmeasure_epoch=0.0144, val_rouge1_precision_epoch=0.0261, val_rouge1_recall_epoch=0.0113, val_rouge2_fmeasure_epoch=0.000, val_rouge2_precision_epoch=0.000, val_rouge2_recall_epoch=0.000, val_rougeL_fmeasure_epoch=0.0129, val_rougeL_precision_epoch=0.0233, val_rougeL_recall_epoch=0.010, val_rougeLsum_fmeasure_epoch=0.0144, val_rougeLsum_precision_epoch=0.0261, val_rougeLsum_recall_epoch=0.0113]Epoch 0: 100%|██████████| 192/192 [01:02<00:00, 3.06it/s, v_num=gy_1, train_loss_step=0.763, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000, val_loss_step=6.070, val_rouge1_fmeasure_step=0.000, val_rouge1_precision_step=0.000, val_rouge1_recall_step=0.000, val_rouge2_fmeasure_step=0.000, val_rouge2_precision_step=0.000, val_rouge2_recall_step=0.000, val_rougeL_fmeasure_step=0.000, val_rougeL_precision_step=0.000, val_rougeL_recall_step=0.000, val_rougeLsum_fmeasure_step=0.000, val_rougeLsum_precision_step=0.000, val_rougeLsum_recall_step=0.000, val_loss_epoch=1.350, val_rouge1_fmeasure_epoch=0.0144, val_rouge1_precision_epoch=0.0261, val_rouge1_recall_epoch=0.0113, val_rouge2_fmeasure_epoch=0.000, val_rouge2_precision_epoch=0.000, val_rouge2_recall_epoch=0.000, val_rougeL_fmeasure_epoch=0.0129, val_rougeL_precision_epoch=0.0233, val_rougeL_recall_epoch=0.010, val_rougeLsum_fmeasure_epoch=0.0144, val_rougeLsum_precision_epoch=0.0261, val_rougeLsum_recall_epoch=0.0113, train_loss_epoch=2.300, train_rouge1_fmeasure_epoch=0.0101, train_rouge1_precision_epoch=0.013, train_rouge1_recall_epoch=0.026, train_rouge2_fmeasure_epoch=0.000, train_rouge2_precision_epoch=0.000, train_rouge2_recall_epoch=0.000, train_rougeL_fmeasure_epoch=0.00923, train_rougeL_precision_epoch=0.0117, train_rougeL_recall_epoch=0.0238, train_rougeLsum_fmeasure_epoch=0.00992, train_rougeLsum_precision_epoch=0.0126, train_rougeLsum_recall_epoch=0.0258]`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|██████████| 192/192 [01:27<00:00, 2.20it/s, v_num=gy_1, train_loss_step=0.763, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000, val_loss_step=6.070, val_rouge1_fmeasure_step=0.000, val_rouge1_precision_step=0.000, val_rouge1_recall_step=0.000, val_rouge2_fmeasure_step=0.000, val_rouge2_precision_step=0.000, val_rouge2_recall_step=0.000, val_rougeL_fmeasure_step=0.000, val_rougeL_precision_step=0.000, val_rougeL_recall_step=0.000, val_rougeLsum_fmeasure_step=0.000, val_rougeLsum_precision_step=0.000, val_rougeLsum_recall_step=0.000, val_loss_epoch=1.350, val_rouge1_fmeasure_epoch=0.0144, val_rouge1_precision_epoch=0.0261, val_rouge1_recall_epoch=0.0113, val_rouge2_fmeasure_epoch=0.000, val_rouge2_precision_epoch=0.000, val_rouge2_recall_epoch=0.000, val_rougeL_fmeasure_epoch=0.0129, val_rougeL_precision_epoch=0.0233, val_rougeL_recall_epoch=0.010, val_rougeLsum_fmeasure_epoch=0.0144, val_rougeLsum_precision_epoch=0.0261, val_rougeLsum_recall_epoch=0.0113, train_loss_epoch=2.300, train_rouge1_fmeasure_epoch=0.0101, train_rouge1_precision_epoch=0.013, train_rouge1_recall_epoch=0.026, train_rouge2_fmeasure_epoch=0.000, train_rouge2_precision_epoch=0.000, train_rouge2_recall_epoch=0.000, train_rougeL_fmeasure_epoch=0.00923, train_rougeL_precision_epoch=0.0117, train_rougeL_recall_epoch=0.0238, train_rougeLsum_fmeasure_epoch=0.00992, train_rougeLsum_precision_epoch=0.0126, train_rougeLsum_recall_epoch=0.0258]wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.014 MB uploadedwandb: / 0.014 MB of 0.014 MB uploadedwandb: - 0.014 MB of 0.020 MB uploadedwandb: \ 0.014 MB of 0.020 MB uploadedwandb: | 0.027 MB of 0.027 MB uploadedwandb:
wandb:
wandb: Run history:
wandb: epoch ▁▁
wandb: train_loss_epoch ▁
wandb: train_rouge1_fmeasure_epoch ▁
wandb: train_rouge1_precision_epoch ▁
wandb: train_rouge1_recall_epoch ▁
wandb: train_rouge2_fmeasure_epoch ▁
wandb: train_rouge2_precision_epoch ▁
wandb: train_rouge2_recall_epoch ▁
wandb: train_rougeL_fmeasure_epoch ▁
wandb: train_rougeL_precision_epoch ▁
wandb: train_rougeL_recall_epoch ▁
wandb: train_rougeLsum_fmeasure_epoch ▁
wandb: train_rougeLsum_precision_epoch ▁
wandb: train_rougeLsum_recall_epoch ▁
wandb: trainer/global_step ▁▁▂▂▂▂▃▃▃▃▄▄██
wandb: val_loss_epoch ▁
wandb: val_loss_step ▂▁▁▁▁▃▄▁▂▂▃█
wandb: val_rouge1_fmeasure_epoch ▁
wandb: val_rouge1_fmeasure_step ▂▁▅█▃▄▁▁▁▁▁▁
wandb: val_rouge1_precision_epoch ▁
wandb: val_rouge1_precision_step ▃▁▆▆▃█▁▁▁▁▁▁
wandb: val_rouge1_recall_epoch ▁
wandb: val_rouge1_recall_step ▂▁▄█▂▃▁▁▁▁▁▁
wandb: val_rouge2_fmeasure_epoch ▁
wandb: val_rouge2_fmeasure_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rouge2_precision_epoch ▁
wandb: val_rouge2_precision_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rouge2_recall_epoch ▁
wandb: val_rouge2_recall_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rougeL_fmeasure_epoch ▁
wandb: val_rougeL_fmeasure_step ▃▁▇█▃▅▁▁▁▁▁▁
wandb: val_rougeL_precision_epoch ▁
wandb: val_rougeL_precision_step ▄▁▇▆▃█▁▁▁▁▁▁
wandb: val_rougeL_recall_epoch ▁
wandb: val_rougeL_recall_step ▂▁▅█▂▃▁▁▁▁▁▁
wandb: val_rougeLsum_fmeasure_epoch ▁
wandb: val_rougeLsum_fmeasure_step ▂▁▅█▃▄▁▁▁▁▁▁
wandb: val_rougeLsum_precision_epoch ▁
wandb: val_rougeLsum_precision_step ▃▁▆▆▃█▁▁▁▁▁▁
wandb: val_rougeLsum_recall_epoch ▁
wandb: val_rougeLsum_recall_step ▂▁▄█▂▃▁▁▁▁▁▁
wandb:
wandb: Run summary:
wandb: epoch 0
wandb: train_loss_epoch 2.29987
wandb: train_rouge1_fmeasure_epoch 0.01015
wandb: train_rouge1_precision_epoch 0.01305
wandb: train_rouge1_recall_epoch 0.02602
wandb: train_rouge2_fmeasure_epoch 0.0
wandb: train_rouge2_precision_epoch 0.0
wandb: train_rouge2_recall_epoch 0.0
wandb: train_rougeL_fmeasure_epoch 0.00923
wandb: train_rougeL_precision_epoch 0.0117
wandb: train_rougeL_recall_epoch 0.02379
wandb: train_rougeLsum_fmeasure_epoch 0.00992
wandb: train_rougeLsum_precision_epoch 0.01263
wandb: train_rougeLsum_recall_epoch 0.02581
wandb: trainer/global_step 26
wandb: val_loss_epoch 1.35165
wandb: val_loss_step 6.06906
wandb: val_rouge1_fmeasure_epoch 0.01442
wandb: val_rouge1_fmeasure_step 0.0
wandb: val_rouge1_precision_epoch 0.02607
wandb: val_rouge1_precision_step 0.0
wandb: val_rouge1_recall_epoch 0.01127
wandb: val_rouge1_recall_step 0.0
wandb: val_rouge2_fmeasure_epoch 0.0
wandb: val_rouge2_fmeasure_step 0.0
wandb: val_rouge2_precision_epoch 0.0
wandb: val_rouge2_precision_step 0.0
wandb: val_rouge2_recall_epoch 0.0
wandb: val_rouge2_recall_step 0.0
wandb: val_rougeL_fmeasure_epoch 0.01289
wandb: val_rougeL_fmeasure_step 0.0
wandb: val_rougeL_precision_epoch 0.02329
wandb: val_rougeL_precision_step 0.0
wandb: val_rougeL_recall_epoch 0.01003
wandb: val_rougeL_recall_step 0.0
wandb: val_rougeLsum_fmeasure_epoch 0.01442
wandb: val_rougeLsum_fmeasure_step 0.0
wandb: val_rougeLsum_precision_epoch 0.02607
wandb: val_rougeLsum_precision_step 0.0
wandb: val_rougeLsum_recall_epoch 0.01127
wandb: val_rougeLsum_recall_step 0.0
wandb:
wandb: 🚀 View run usual-sweep-1 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/rxx2sygy
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_134906-rxx2sygy/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
wandb: Agent Starting Run: rwwzlqym with config:
wandb: accumulate_grad_batches: 1
wandb: epochs: 1
wandb: gradient_clip_val: 0.167332531520484
wandb: init_lora_weights: pissa_niter_16
wandb: lora_alpha: 32
wandb: lora_dropout: 0.09360935034869836
wandb: lora_rank: 8
wandb: lr: 0.0004243323255909761
wandb: model_name: MLP-KTLim/llama-3-Korean-Bllossom-8B
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_135122-rwwzlqym
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run peach-sweep-2
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/rwwzlqym
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Repo card metadata block was not found. Setting CardData to empty.
Map: 0%| | 0/960 [00:00<?, ? examples/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Map: 100%|██████████| 960/960 [00:00<00:00, 7297.63 examples/s]Map: 100%|██████████| 960/960 [00:00<00:00, 6496.33 examples/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Enabling DeepSpeed BF16. Model parameters and inputs will be cast to `bfloat16`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[2024-08-06 13:54:29,819] [WARNING] [engine.py:1179:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
| Name | Type | Params | Mode
-------------------------------------------------------
0 | model | PeftModelForCausalLM | 1.5 B | train
-------------------------------------------------------
1.1 M Trainable params
1.5 B Non-trainable params
1.5 B Total params
6,183.574 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]Sanity Checking: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 3.57it/s] Training: | | 0/? [00:00<?, ?it/s]Training: 0%| | 0/192 [00:00<?, ?it/s]Epoch 0: 0%| | 0/192 [00:00<?, ?it/s] Epoch 0: 3%|▎ | 5/192 [00:01<01:12, 2.58it/s]Epoch 0: 3%|▎ | 5/192 [00:01<01:12, 2.58it/s, v_num=ym_1, train_loss_step=10.50, train_rouge1_fmeasure_step=0.00277, train_rouge1_precision_step=0.0014, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00277, train_rougeL_precision_step=0.0014, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.00277, train_rougeLsum_precision_step=0.0014, train_rougeLsum_recall_step=0.167]Epoch 0: 5%|▌ | 10/192 [00:03<01:05, 2.77it/s, v_num=ym_1, train_loss_step=10.50, train_rouge1_fmeasure_step=0.00277, train_rouge1_precision_step=0.0014, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00277, train_rougeL_precision_step=0.0014, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.00277, train_rougeLsum_precision_step=0.0014, train_rougeLsum_recall_step=0.167]Epoch 0: 5%|▌ | 10/192 [00:03<01:05, 2.77it/s, v_num=ym_1, train_loss_step=8.880, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 8%|▊ | 15/192 [00:05<01:01, 2.87it/s, v_num=ym_1, train_loss_step=8.880, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 8%|▊ | 15/192 [00:05<01:01, 2.87it/s, v_num=ym_1, train_loss_step=3.280, train_rouge1_fmeasure_step=0.00959, train_rouge1_precision_step=0.00506, train_rouge1_recall_step=0.0909, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00959, train_rougeL_precision_step=0.00506, train_rougeL_recall_step=0.0909, train_rougeLsum_fmeasure_step=0.00959, train_rougeLsum_precision_step=0.00506, train_rougeLsum_recall_step=0.0909]Epoch 0: 10%|█ | 20/192 [00:06<00:59, 2.91it/s, v_num=ym_1, train_loss_step=3.280, train_rouge1_fmeasure_step=0.00959, train_rouge1_precision_step=0.00506, train_rouge1_recall_step=0.0909, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00959, train_rougeL_precision_step=0.00506, train_rougeL_recall_step=0.0909, train_rougeLsum_fmeasure_step=0.00959, train_rougeLsum_precision_step=0.00506, train_rougeLsum_recall_step=0.0909]Epoch 0: 10%|█ | 20/192 [00:06<00:59, 2.91it/s, v_num=ym_1, train_loss_step=2.130, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 13%|█▎ | 25/192 [00:08<00:56, 2.95it/s, v_num=ym_1, train_loss_step=2.130, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 13%|█▎ | 25/192 [00:08<00:56, 2.95it/s, v_num=ym_1, train_loss_step=0.495, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 16%|█▌ | 30/192 [00:10<00:54, 2.97it/s, v_num=ym_1, train_loss_step=0.495, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 16%|█▌ | 30/192 [00:10<00:54, 2.97it/s, v_num=ym_1, train_loss_step=0.216, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:11<00:52, 2.99it/s, v_num=ym_1, train_loss_step=0.216, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:11<00:52, 2.99it/s, v_num=ym_1, train_loss_step=1.370, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 21%|██ | 40/192 [00:13<00:50, 3.01it/s, v_num=ym_1, train_loss_step=1.370, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 21%|██ | 40/192 [00:13<00:50, 3.01it/s, v_num=ym_1, train_loss_step=0.390, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 23%|██▎ | 45/192 [00:14<00:48, 3.02it/s, v_num=ym_1, train_loss_step=0.390, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 23%|██▎ | 45/192 [00:14<00:48, 3.02it/s, v_num=ym_1, train_loss_step=0.245, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 26%|██▌ | 50/192 [00:16<00:46, 3.03it/s, v_num=ym_1, train_loss_step=0.245, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 26%|██▌ | 50/192 [00:16<00:46, 3.03it/s, v_num=ym_1, train_loss_step=0.512, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:18<00:45, 3.04it/s, v_num=ym_1, train_loss_step=0.512, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:18<00:45, 3.04it/s, v_num=ym_1, train_loss_step=0.258, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:19<00:43, 3.04it/s, v_num=ym_1, train_loss_step=0.258, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:19<00:43, 3.04it/s, v_num=ym_1, train_loss_step=0.144, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 34%|███▍ | 65/192 [00:21<00:41, 3.05it/s, v_num=ym_1, train_loss_step=0.144, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 34%|███▍ | 65/192 [00:21<00:41, 3.05it/s, v_num=ym_1, train_loss_step=0.231, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 36%|███▋ | 70/192 [00:22<00:39, 3.05it/s, v_num=ym_1, train_loss_step=0.231, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 36%|███▋ | 70/192 [00:22<00:39, 3.05it/s, v_num=ym_1, train_loss_step=0.0623, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 39%|███▉ | 75/192 [00:24<00:38, 3.06it/s, v_num=ym_1, train_loss_step=0.0623, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 39%|███▉ | 75/192 [00:24<00:38, 3.06it/s, v_num=ym_1, train_loss_step=1.880, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 42%|████▏ | 80/192 [00:26<00:36, 3.06it/s, v_num=ym_1, train_loss_step=1.880, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 42%|████▏ | 80/192 [00:26<00:36, 3.06it/s, v_num=ym_1, train_loss_step=0.791, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 44%|████▍ | 85/192 [00:27<00:34, 3.06it/s, v_num=ym_1, train_loss_step=0.791, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 44%|████▍ | 85/192 [00:27<00:34, 3.06it/s, v_num=ym_1, train_loss_step=0.548, train_rouge1_fmeasure_step=0.0351, train_rouge1_precision_step=0.040, train_rouge1_recall_step=0.0312, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0351, train_rougeL_precision_step=0.040, train_rougeL_recall_step=0.0312, train_rougeLsum_fmeasure_step=0.0351, train_rougeLsum_precision_step=0.040, train_rougeLsum_recall_step=0.0312]Epoch 0: 47%|████▋ | 90/192 [00:29<00:33, 3.07it/s, v_num=ym_1, train_loss_step=0.548, train_rouge1_fmeasure_step=0.0351, train_rouge1_precision_step=0.040, train_rouge1_recall_step=0.0312, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0351, train_rougeL_precision_step=0.040, train_rougeL_recall_step=0.0312, train_rougeLsum_fmeasure_step=0.0351, train_rougeLsum_precision_step=0.040, train_rougeLsum_recall_step=0.0312]Epoch 0: 47%|████▋ | 90/192 [00:29<00:33, 3.07it/s, v_num=ym_1, train_loss_step=1.660, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 49%|████▉ | 95/192 [00:30<00:31, 3.07it/s, v_num=ym_1, train_loss_step=1.660, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 49%|████▉ | 95/192 [00:30<00:31, 3.07it/s, v_num=ym_1, train_loss_step=0.176, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 52%|█████▏ | 100/192 [00:32<00:29, 3.07it/s, v_num=ym_1, train_loss_step=0.176, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 52%|█████▏ | 100/192 [00:32<00:29, 3.07it/s, v_num=ym_1, train_loss_step=0.229, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:34<00:28, 3.07it/s, v_num=ym_1, train_loss_step=0.229, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:34<00:28, 3.07it/s, v_num=ym_1, train_loss_step=0.0995, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:35<00:26, 3.07it/s, v_num=ym_1, train_loss_step=0.0995, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:35<00:26, 3.07it/s, v_num=ym_1, train_loss_step=0.092, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 60%|█████▉ | 115/192 [00:37<00:25, 3.08it/s, v_num=ym_1, train_loss_step=0.092, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 60%|█████▉ | 115/192 [00:37<00:25, 3.08it/s, v_num=ym_1, train_loss_step=1.250, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 62%|██████▎ | 120/192 [00:38<00:23, 3.08it/s, v_num=ym_1, train_loss_step=1.250, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 62%|██████▎ | 120/192 [00:38<00:23, 3.08it/s, v_num=ym_1, train_loss_step=0.134, train_rouge1_fmeasure_step=0.100, train_rouge1_precision_step=0.200, train_rouge1_recall_step=0.0667, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.100, train_rougeL_precision_step=0.200, train_rougeL_recall_step=0.0667, train_rougeLsum_fmeasure_step=0.100, train_rougeLsum_precision_step=0.200, train_rougeLsum_recall_step=0.0667]Epoch 0: 65%|██████▌ | 125/192 [00:40<00:21, 3.08it/s, v_num=ym_1, train_loss_step=0.134, train_rouge1_fmeasure_step=0.100, train_rouge1_precision_step=0.200, train_rouge1_recall_step=0.0667, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.100, train_rougeL_precision_step=0.200, train_rougeL_recall_step=0.0667, train_rougeLsum_fmeasure_step=0.100, train_rougeLsum_precision_step=0.200, train_rougeLsum_recall_step=0.0667]Epoch 0: 65%|██████▌ | 125/192 [00:40<00:21, 3.08it/s, v_num=ym_1, train_loss_step=0.103, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 68%|██████▊ | 130/192 [00:42<00:20, 3.08it/s, v_num=ym_1, train_loss_step=0.103, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 68%|██████▊ | 130/192 [00:42<00:20, 3.08it/s, v_num=ym_1, train_loss_step=1.170, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 70%|███████ | 135/192 [00:43<00:18, 3.08it/s, v_num=ym_1, train_loss_step=1.170, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 70%|███████ | 135/192 [00:43<00:18, 3.08it/s, v_num=ym_1, train_loss_step=0.502, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 73%|███████▎ | 140/192 [00:45<00:16, 3.08it/s, v_num=ym_1, train_loss_step=0.502, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 73%|███████▎ | 140/192 [00:45<00:16, 3.08it/s, v_num=ym_1, train_loss_step=1.120, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 76%|███████▌ | 145/192 [00:47<00:15, 3.08it/s, v_num=ym_1, train_loss_step=1.120, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 76%|███████▌ | 145/192 [00:47<00:15, 3.08it/s, v_num=ym_1, train_loss_step=0.0634, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 78%|███████▊ | 150/192 [00:48<00:13, 3.09it/s, v_num=ym_1, train_loss_step=0.0634, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 78%|███████▊ | 150/192 [00:48<00:13, 3.09it/s, v_num=ym_1, train_loss_step=0.492, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 81%|████████ | 155/192 [00:50<00:11, 3.09it/s, v_num=ym_1, train_loss_step=0.492, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 81%|████████ | 155/192 [00:50<00:11, 3.09it/s, v_num=ym_1, train_loss_step=0.550, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 83%|████████▎ | 160/192 [00:51<00:10, 3.09it/s, v_num=ym_1, train_loss_step=0.550, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 83%|████████▎ | 160/192 [00:51<00:10, 3.09it/s, v_num=ym_1, train_loss_step=0.173, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 86%|████████▌ | 165/192 [00:53<00:08, 3.09it/s, v_num=ym_1, train_loss_step=0.173, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 86%|████████▌ | 165/192 [00:53<00:08, 3.09it/s, v_num=ym_1, train_loss_step=0.591, train_rouge1_fmeasure_step=0.400, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.500, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.400, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.500, train_rougeLsum_fmeasure_step=0.400, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.500]Epoch 0: 89%|████████▊ | 170/192 [00:55<00:07, 3.09it/s, v_num=ym_1, train_loss_step=0.591, train_rouge1_fmeasure_step=0.400, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.500, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.400, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.500, train_rougeLsum_fmeasure_step=0.400, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.500]Epoch 0: 89%|████████▊ | 170/192 [00:55<00:07, 3.09it/s, v_num=ym_1, train_loss_step=0.140, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=0.167, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=0.167, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=1.000]Epoch 0: 91%|█████████ | 175/192 [00:56<00:05, 3.09it/s, v_num=ym_1, train_loss_step=0.140, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=0.167, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=0.167, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=1.000]Epoch 0: 91%|█████████ | 175/192 [00:56<00:05, 3.09it/s, v_num=ym_1, train_loss_step=0.157, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 94%|█████████▍| 180/192 [00:58<00:03, 3.09it/s, v_num=ym_1, train_loss_step=0.157, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 94%|█████████▍| 180/192 [00:58<00:03, 3.09it/s, v_num=ym_1, train_loss_step=0.141, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.333, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.333, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=1.000]Epoch 0: 96%|█████████▋| 185/192 [00:59<00:02, 3.09it/s, v_num=ym_1, train_loss_step=0.141, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.333, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.333, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=1.000]Epoch 0: 96%|█████████▋| 185/192 [00:59<00:02, 3.09it/s, v_num=ym_1, train_loss_step=2.820, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.250, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.250, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.250]Epoch 0: 99%|█████████▉| 190/192 [01:01<00:00, 3.09it/s, v_num=ym_1, train_loss_step=2.820, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.250, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.250, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.250]Epoch 0: 99%|█████████▉| 190/192 [01:01<00:00, 3.09it/s, v_num=ym_1, train_loss_step=0.229, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 100%|██████████| 192/192 [01:02<00:00, 3.09it/s, v_num=ym_1, train_loss_step=0.229, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 100%|██████████| 192/192 [01:02<00:00, 3.09it/s, v_num=ym_1, train_loss_step=0.185, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.333, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.333, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=1.000]
Validation: | | 0/? [00:00<?, ?it/s][A
Validation: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 42%|████▏ | 5/12 [00:00<00:01, 5.30it/s][A
Validation DataLoader 0: 83%|████████▎ | 10/12 [00:01<00:00, 5.26it/s][A
Validation DataLoader 0: 100%|██████████| 12/12 [00:02<00:00, 5.26it/s][A
[AEpoch 0: 100%|██████████| 192/192 [01:04<00:00, 2.98it/s, v_num=ym_1, train_loss_step=0.185, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.333, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.333, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=1.000, val_loss_step=2.200, val_rouge1_fmeasure_step=0.321, val_rouge1_precision_step=0.300, val_rouge1_recall_step=0.389, val_rouge2_fmeasure_step=0.000, val_rouge2_precision_step=0.000, val_rouge2_recall_step=0.000, val_rougeL_fmeasure_step=0.321, val_rougeL_precision_step=0.300, val_rougeL_recall_step=0.389, val_rougeLsum_fmeasure_step=0.321, val_rougeLsum_precision_step=0.300, val_rougeLsum_recall_step=0.389, val_loss_epoch=0.531, val_rouge1_fmeasure_epoch=0.200, val_rouge1_precision_epoch=0.232, val_rouge1_recall_epoch=0.359, val_rouge2_fmeasure_epoch=0.000, val_rouge2_precision_epoch=0.000, val_rouge2_recall_epoch=0.000, val_rougeL_fmeasure_epoch=0.200, val_rougeL_precision_epoch=0.232, val_rougeL_recall_epoch=0.359, val_rougeLsum_fmeasure_epoch=0.200, val_rougeLsum_precision_epoch=0.232, val_rougeLsum_recall_epoch=0.359]Epoch 0: 100%|██████████| 192/192 [01:04<00:00, 2.98it/s, v_num=ym_1, train_loss_step=0.185, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.333, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.333, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=1.000, val_loss_step=2.200, val_rouge1_fmeasure_step=0.321, val_rouge1_precision_step=0.300, val_rouge1_recall_step=0.389, val_rouge2_fmeasure_step=0.000, val_rouge2_precision_step=0.000, val_rouge2_recall_step=0.000, val_rougeL_fmeasure_step=0.321, val_rougeL_precision_step=0.300, val_rougeL_recall_step=0.389, val_rougeLsum_fmeasure_step=0.321, val_rougeLsum_precision_step=0.300, val_rougeLsum_recall_step=0.389, val_loss_epoch=0.531, val_rouge1_fmeasure_epoch=0.200, val_rouge1_precision_epoch=0.232, val_rouge1_recall_epoch=0.359, val_rouge2_fmeasure_epoch=0.000, val_rouge2_precision_epoch=0.000, val_rouge2_recall_epoch=0.000, val_rougeL_fmeasure_epoch=0.200, val_rougeL_precision_epoch=0.232, val_rougeL_recall_epoch=0.359, val_rougeLsum_fmeasure_epoch=0.200, val_rougeLsum_precision_epoch=0.232, val_rougeLsum_recall_epoch=0.359, train_loss_epoch=1.190, train_rouge1_fmeasure_epoch=0.0338, train_rouge1_precision_epoch=0.037, train_rouge1_recall_epoch=0.0593, train_rouge2_fmeasure_epoch=2.76e-5, train_rouge2_precision_epoch=1.47e-5, train_rouge2_recall_epoch=0.000223, train_rougeL_fmeasure_epoch=0.0337, train_rougeL_precision_epoch=0.0369, train_rougeL_recall_epoch=0.0589, train_rougeLsum_fmeasure_epoch=0.0338, train_rougeLsum_precision_epoch=0.037, train_rougeLsum_recall_epoch=0.0593]`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|██████████| 192/192 [01:29<00:00, 2.15it/s, v_num=ym_1, train_loss_step=0.185, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.333, train_rouge1_recall_step=1.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.333, train_rougeL_recall_step=1.000, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=1.000, val_loss_step=2.200, val_rouge1_fmeasure_step=0.321, val_rouge1_precision_step=0.300, val_rouge1_recall_step=0.389, val_rouge2_fmeasure_step=0.000, val_rouge2_precision_step=0.000, val_rouge2_recall_step=0.000, val_rougeL_fmeasure_step=0.321, val_rougeL_precision_step=0.300, val_rougeL_recall_step=0.389, val_rougeLsum_fmeasure_step=0.321, val_rougeLsum_precision_step=0.300, val_rougeLsum_recall_step=0.389, val_loss_epoch=0.531, val_rouge1_fmeasure_epoch=0.200, val_rouge1_precision_epoch=0.232, val_rouge1_recall_epoch=0.359, val_rouge2_fmeasure_epoch=0.000, val_rouge2_precision_epoch=0.000, val_rouge2_recall_epoch=0.000, val_rougeL_fmeasure_epoch=0.200, val_rougeL_precision_epoch=0.232, val_rougeL_recall_epoch=0.359, val_rougeLsum_fmeasure_epoch=0.200, val_rougeLsum_precision_epoch=0.232, val_rougeLsum_recall_epoch=0.359, train_loss_epoch=1.190, train_rouge1_fmeasure_epoch=0.0338, train_rouge1_precision_epoch=0.037, train_rouge1_recall_epoch=0.0593, train_rouge2_fmeasure_epoch=2.76e-5, train_rouge2_precision_epoch=1.47e-5, train_rouge2_recall_epoch=0.000223, train_rougeL_fmeasure_epoch=0.0337, train_rougeL_precision_epoch=0.0369, train_rougeL_recall_epoch=0.0589, train_rougeLsum_fmeasure_epoch=0.0338, train_rougeLsum_precision_epoch=0.037, train_rougeLsum_recall_epoch=0.0593]wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.024 MB uploadedwandb: | 0.014 MB of 0.025 MB uploadedwandb: / 0.025 MB of 0.025 MB uploadedwandb: - 0.025 MB of 0.025 MB uploadedwandb: \ 0.025 MB of 0.025 MB uploadedwandb: | 0.025 MB of 0.025 MB uploadedwandb:
wandb:
wandb: Run history:
wandb: epoch ▁▁▁▁▁
wandb: lr-PagedLion ▆█▁
wandb: train_loss_epoch ▁
wandb: train_loss_step █▁█
wandb: train_rouge1_fmeasure_epoch ▁
wandb: train_rouge1_fmeasure_step ▁▁▁
wandb: train_rouge1_precision_epoch ▁
wandb: train_rouge1_precision_step ▁▁▁
wandb: train_rouge1_recall_epoch ▁
wandb: train_rouge1_recall_step ▁▁▁
wandb: train_rouge2_fmeasure_epoch ▁
wandb: train_rouge2_fmeasure_step ▁▁▁
wandb: train_rouge2_precision_epoch ▁
wandb: train_rouge2_precision_step ▁▁▁
wandb: train_rouge2_recall_epoch ▁
wandb: train_rouge2_recall_step ▁▁▁
wandb: train_rougeL_fmeasure_epoch ▁
wandb: train_rougeL_fmeasure_step ▁▁▁
wandb: train_rougeL_precision_epoch ▁
wandb: train_rougeL_precision_step ▁▁▁
wandb: train_rougeL_recall_epoch ▁
wandb: train_rougeL_recall_step ▁▁▁
wandb: train_rougeLsum_fmeasure_epoch ▁
wandb: train_rougeLsum_fmeasure_step ▁▁▁
wandb: train_rougeLsum_precision_epoch ▁
wandb: train_rougeLsum_precision_step ▁▁▁
wandb: train_rougeLsum_recall_epoch ▁
wandb: train_rougeLsum_recall_step ▁▁▁
wandb: trainer/global_step ▃▃▅▅▆▆▁▁▁▁▁▁▁▁▁▁▁▁██
wandb: val_loss_epoch ▁
wandb: val_loss_step ▂▁▁▁▁▅▃▁▂▁▂█
wandb: val_rouge1_fmeasure_epoch ▁
wandb: val_rouge1_fmeasure_step ▄▆▃▃▃▁▆█▆▇▇█
wandb: val_rouge1_precision_epoch ▁
wandb: val_rouge1_precision_step ▅▅▅▅▅▅▂▂▁▂▂█
wandb: val_rouge1_recall_epoch ▁
wandb: val_rouge1_recall_step ▂▃▁▁▁▁▅█▄▆▆▄
wandb: val_rouge2_fmeasure_epoch ▁
wandb: val_rouge2_fmeasure_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rouge2_precision_epoch ▁
wandb: val_rouge2_precision_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rouge2_recall_epoch ▁
wandb: val_rouge2_recall_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rougeL_fmeasure_epoch ▁
wandb: val_rougeL_fmeasure_step ▄▆▃▃▃▁▆█▆▇▇█
wandb: val_rougeL_precision_epoch ▁
wandb: val_rougeL_precision_step ▅▅▅▅▅▅▂▂▁▂▂█
wandb: val_rougeL_recall_epoch ▁
wandb: val_rougeL_recall_step ▂▃▁▁▁▁▅█▄▆▆▄
wandb: val_rougeLsum_fmeasure_epoch ▁
wandb: val_rougeLsum_fmeasure_step ▄▆▃▃▃▁▆█▆▇▇█
wandb: val_rougeLsum_precision_epoch ▁
wandb: val_rougeLsum_precision_step ▅▅▅▅▅▅▂▂▁▂▂█
wandb: val_rougeLsum_recall_epoch ▁
wandb: val_rougeLsum_recall_step ▂▃▁▁▁▁▅█▄▆▆▄
wandb:
wandb: Run summary:
wandb: epoch 0
wandb: lr-PagedLion 1e-05
wandb: train_loss_epoch 1.19016
wandb: train_loss_step 0.49164
wandb: train_rouge1_fmeasure_epoch 0.03375
wandb: train_rouge1_fmeasure_step 0.0
wandb: train_rouge1_precision_epoch 0.03697
wandb: train_rouge1_precision_step 0.0
wandb: train_rouge1_recall_epoch 0.05931
wandb: train_rouge1_recall_step 0.0
wandb: train_rouge2_fmeasure_epoch 3e-05
wandb: train_rouge2_fmeasure_step 0.0
wandb: train_rouge2_precision_epoch 1e-05
wandb: train_rouge2_precision_step 0.0
wandb: train_rouge2_recall_epoch 0.00022
wandb: train_rouge2_recall_step 0.0
wandb: train_rougeL_fmeasure_epoch 0.03368
wandb: train_rougeL_fmeasure_step 0.0
wandb: train_rougeL_precision_epoch 0.03693
wandb: train_rougeL_precision_step 0.0
wandb: train_rougeL_recall_epoch 0.05894
wandb: train_rougeL_recall_step 0.0
wandb: train_rougeLsum_fmeasure_epoch 0.03375
wandb: train_rougeLsum_fmeasure_step 0.0
wandb: train_rougeLsum_precision_epoch 0.03697
wandb: train_rougeLsum_precision_step 0.0
wandb: train_rougeLsum_recall_epoch 0.05931
wandb: train_rougeLsum_recall_step 0.0
wandb: trainer/global_step 191
wandb: val_loss_epoch 0.5311
wandb: val_loss_step 2.19732
wandb: val_rouge1_fmeasure_epoch 0.20039
wandb: val_rouge1_fmeasure_step 0.32143
wandb: val_rouge1_precision_epoch 0.23194
wandb: val_rouge1_precision_step 0.3
wandb: val_rouge1_recall_epoch 0.35879
wandb: val_rouge1_recall_step 0.38889
wandb: val_rouge2_fmeasure_epoch 0.0
wandb: val_rouge2_fmeasure_step 0.0
wandb: val_rouge2_precision_epoch 0.0
wandb: val_rouge2_precision_step 0.0
wandb: val_rouge2_recall_epoch 0.0
wandb: val_rouge2_recall_step 0.0
wandb: val_rougeL_fmeasure_epoch 0.20039
wandb: val_rougeL_fmeasure_step 0.32143
wandb: val_rougeL_precision_epoch 0.23194
wandb: val_rougeL_precision_step 0.3
wandb: val_rougeL_recall_epoch 0.35879
wandb: val_rougeL_recall_step 0.38889
wandb: val_rougeLsum_fmeasure_epoch 0.20039
wandb: val_rougeLsum_fmeasure_step 0.32143
wandb: val_rougeLsum_precision_epoch 0.23194
wandb: val_rougeLsum_precision_step 0.3
wandb: val_rougeLsum_recall_epoch 0.35879
wandb: val_rougeLsum_recall_step 0.38889
wandb:
wandb: 🚀 View run peach-sweep-2 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/rwwzlqym
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_135122-rwwzlqym/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
wandb: Agent Starting Run: jpmw6qcg with config:
wandb: accumulate_grad_batches: 3
wandb: epochs: 1
wandb: gradient_clip_val: 0.9721876127240872
wandb: init_lora_weights: loftq
wandb: lora_alpha: 32
wandb: lora_dropout: 0.07538488186871073
wandb: lora_rank: 16
wandb: lr: 0.00017737658879369092
wandb: model_name: mistralai/Mistral-Nemo-Instruct-2407
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_135615-jpmw6qcg
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run dutiful-sweep-3
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/jpmw6qcg
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Map: 0%| | 0/960 [00:00<?, ? examples/s]Map: 0%| | 0/960 [00:00<?, ? examples/s]
wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.014 MB uploadedwandb: / 0.015 MB of 0.019 MB uploadedwandb: - 0.019 MB of 0.019 MB uploadedwandb: \ 0.019 MB of 0.019 MB uploadedwandb: | 0.019 MB of 0.019 MB uploadedwandb:
wandb: 🚀 View run dutiful-sweep-3 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/jpmw6qcg
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_135615-jpmw6qcg/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
Run jpmw6qcg errored:
Traceback (most recent call last):
File "/user/jonathan/jonathan/lib/python3.10/site-packages/wandb/agents/pyagent.py", line 307, in _run_job
self._function()
File "/user/jonathan/l_sweep.py", line 50, in l2ray_trainer
dataset = get_dataset(dataset_name=dataset_path, tokenizer=tokenizer)
File "/user/jonathan/finetuning_datasets.py", line 40, in get_dataset
en_dataset = raw_dataset.map(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
{
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
k: dataset.map(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3161, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3552, in _map_single
batch = apply_function_on_filtered_inputs(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/user/jonathan/finetuning_datafunctions.py", line 65, in preprocess_function
templated_text, labels = formatting(sample=sample,
File "/user/jonathan/finetuning_datafunctions.py", line 38, in formatting
bot_message = tokenizer.apply_chat_template(conversation=label_template,
File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1855, in apply_chat_template
rendered_chat = compiled_template.render(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
self.environment.handle_exception()
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 14, in top-level template code
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
return __context.call(__obj, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1899, in raise_exception
raise TemplateError(message)
jinja2.exceptions.TemplateError: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
wandb: ERROR Run jpmw6qcg errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/wandb/agents/pyagent.py", line 307, in _run_job
wandb: ERROR self._function()
wandb: ERROR File "/user/jonathan/l_sweep.py", line 50, in l2ray_trainer
wandb: ERROR dataset = get_dataset(dataset_name=dataset_path, tokenizer=tokenizer)
wandb: ERROR File "/user/jonathan/finetuning_datasets.py", line 40, in get_dataset
wandb: ERROR en_dataset = raw_dataset.map(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
wandb: ERROR {
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
wandb: ERROR k: dataset.map(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
wandb: ERROR out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
wandb: ERROR out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3161, in map
wandb: ERROR for rank, done, content in Dataset._map_single(**dataset_kwargs):
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3552, in _map_single
wandb: ERROR batch = apply_function_on_filtered_inputs(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
wandb: ERROR processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
wandb: ERROR File "/user/jonathan/finetuning_datafunctions.py", line 65, in preprocess_function
wandb: ERROR templated_text, labels = formatting(sample=sample,
wandb: ERROR File "/user/jonathan/finetuning_datafunctions.py", line 38, in formatting
wandb: ERROR bot_message = tokenizer.apply_chat_template(conversation=label_template,
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1855, in apply_chat_template
wandb: ERROR rendered_chat = compiled_template.render(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
wandb: ERROR self.environment.handle_exception()
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
wandb: ERROR raise rewrite_traceback_stack(source=source)
wandb: ERROR File "<template>", line 14, in top-level template code
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
wandb: ERROR return __context.call(__obj, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1899, in raise_exception
wandb: ERROR raise TemplateError(message)
wandb: ERROR jinja2.exceptions.TemplateError: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
wandb: ERROR
wandb: Agent Starting Run: dky0jtjo with config:
wandb: accumulate_grad_batches: 2
wandb: epochs: 1
wandb: gradient_clip_val: 0.3061416587717848
wandb: init_lora_weights: pissa_niter_16
wandb: lora_alpha: 64
wandb: lora_dropout: 0.06667523190780655
wandb: lora_rank: 4
wandb: lr: 0.00043427990527397366
wandb: model_name: vilm/vulture-40b
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: - Waiting for wandb.init()...wandb: \ Waiting for wandb.init()...wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_135652-dky0jtjo
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run divine-sweep-4
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/dky0jtjo
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Map: 0%| | 0/960 [00:00<?, ? examples/s]No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.
Map: 100%|██████████| 960/960 [00:00<00:00, 17278.95 examples/s]
Map: 0%| | 0/120 [00:00<?, ? examples/s]Map: 100%|██████████| 120/120 [00:00<00:00, 5231.00 examples/s]
Map: 0%| | 0/121 [00:00<?, ? examples/s]Map: 100%|██████████| 121/121 [00:00<00:00, 5878.32 examples/s]
Map: 0%| | 0/960 [00:00<?, ? examples/s]Map: 100%|██████████| 960/960 [00:00<00:00, 16413.59 examples/s]
Map: 0%| | 0/120 [00:00<?, ? examples/s]Map: 100%|██████████| 120/120 [00:00<00:00, 5707.83 examples/s]
Map: 0%| | 0/121 [00:00<?, ? examples/s]Map: 100%|██████████| 121/121 [00:00<00:00, 6219.88 examples/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Enabling DeepSpeed BF16. Model parameters and inputs will be cast to `bfloat16`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[2024-08-06 13:59:56,471] [WARNING] [engine.py:1179:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
| Name | Type | Params | Mode
-------------------------------------------------------
0 | model | PeftModelForCausalLM | 1.5 B | train
-------------------------------------------------------
544 K Trainable params
1.5 B Non-trainable params
1.5 B Total params
6,179.215 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]Sanity Checking: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 3.62it/s] Training: | | 0/? [00:00<?, ?it/s]Training: 0%| | 0/192 [00:00<?, ?it/s]Epoch 0: 0%| | 0/192 [00:00<?, ?it/s] Epoch 0: 3%|▎ | 5/192 [00:01<01:02, 3.01it/s]Epoch 0: 3%|▎ | 5/192 [00:01<01:02, 3.01it/s, v_num=jo_1, train_loss_step=4.400, train_rouge1_fmeasure_step=0.00624, train_rouge1_precision_step=0.00314, train_rouge1_recall_step=0.400, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00468, train_rougeL_precision_step=0.00236, train_rougeL_recall_step=0.300, train_rougeLsum_fmeasure_step=0.00468, train_rougeLsum_precision_step=0.00236, train_rougeLsum_recall_step=0.300]Epoch 0: 5%|▌ | 10/192 [00:03<01:00, 3.01it/s, v_num=jo_1, train_loss_step=4.400, train_rouge1_fmeasure_step=0.00624, train_rouge1_precision_step=0.00314, train_rouge1_recall_step=0.400, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00468, train_rougeL_precision_step=0.00236, train_rougeL_recall_step=0.300, train_rougeLsum_fmeasure_step=0.00468, train_rougeLsum_precision_step=0.00236, train_rougeLsum_recall_step=0.300]Epoch 0: 5%|▌ | 10/192 [00:03<01:00, 3.01it/s, v_num=jo_1, train_loss_step=3.670, train_rouge1_fmeasure_step=0.00506, train_rouge1_precision_step=0.00255, train_rouge1_recall_step=0.333, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00506, train_rougeL_precision_step=0.00255, train_rougeL_recall_step=0.333, train_rougeLsum_fmeasure_step=0.00253, train_rougeLsum_precision_step=0.00127, train_rougeLsum_recall_step=0.167]Epoch 0: 8%|▊ | 15/192 [00:04<00:58, 3.04it/s, v_num=jo_1, train_loss_step=3.670, train_rouge1_fmeasure_step=0.00506, train_rouge1_precision_step=0.00255, train_rouge1_recall_step=0.333, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.00506, train_rougeL_precision_step=0.00255, train_rougeL_recall_step=0.333, train_rougeLsum_fmeasure_step=0.00253, train_rougeLsum_precision_step=0.00127, train_rougeLsum_recall_step=0.167]Epoch 0: 8%|▊ | 15/192 [00:04<00:58, 3.04it/s, v_num=jo_1, train_loss_step=2.200, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 10%|█ | 20/192 [00:06<00:56, 3.06it/s, v_num=jo_1, train_loss_step=2.200, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 10%|█ | 20/192 [00:06<00:56, 3.06it/s, v_num=jo_1, train_loss_step=1.880, train_rouge1_fmeasure_step=0.0131, train_rouge1_precision_step=0.00719, train_rouge1_recall_step=0.0714, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0131, train_rougeL_precision_step=0.00719, train_rougeL_recall_step=0.0714, train_rougeLsum_fmeasure_step=0.0131, train_rougeLsum_precision_step=0.00719, train_rougeLsum_recall_step=0.0714]Epoch 0: 13%|█▎ | 25/192 [00:08<00:54, 3.07it/s, v_num=jo_1, train_loss_step=1.880, train_rouge1_fmeasure_step=0.0131, train_rouge1_precision_step=0.00719, train_rouge1_recall_step=0.0714, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0131, train_rougeL_precision_step=0.00719, train_rougeL_recall_step=0.0714, train_rougeLsum_fmeasure_step=0.0131, train_rougeLsum_precision_step=0.00719, train_rougeLsum_recall_step=0.0714]Epoch 0: 13%|█▎ | 25/192 [00:08<00:54, 3.07it/s, v_num=jo_1, train_loss_step=0.432, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 16%|█▌ | 30/192 [00:09<00:52, 3.08it/s, v_num=jo_1, train_loss_step=0.432, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 16%|█▌ | 30/192 [00:09<00:52, 3.08it/s, v_num=jo_1, train_loss_step=0.274, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:11<00:50, 3.09it/s, v_num=jo_1, train_loss_step=0.274, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:11<00:50, 3.09it/s, v_num=jo_1, train_loss_step=1.270, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 21%|██ | 40/192 [00:12<00:49, 3.09it/s, v_num=jo_1, train_loss_step=1.270, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 21%|██ | 40/192 [00:12<00:49, 3.09it/s, v_num=jo_1, train_loss_step=0.435, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 23%|██▎ | 45/192 [00:14<00:47, 3.10it/s, v_num=jo_1, train_loss_step=0.435, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 23%|██▎ | 45/192 [00:14<00:47, 3.10it/s, v_num=jo_1, train_loss_step=0.691, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 26%|██▌ | 50/192 [00:16<00:45, 3.10it/s, v_num=jo_1, train_loss_step=0.691, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 26%|██▌ | 50/192 [00:16<00:45, 3.10it/s, v_num=jo_1, train_loss_step=1.530, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:17<00:44, 3.11it/s, v_num=jo_1, train_loss_step=1.530, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:17<00:44, 3.11it/s, v_num=jo_1, train_loss_step=0.322, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:19<00:42, 3.11it/s, v_num=jo_1, train_loss_step=0.322, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:19<00:42, 3.11it/s, v_num=jo_1, train_loss_step=0.193, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 34%|███▍ | 65/192 [00:20<00:40, 3.11it/s, v_num=jo_1, train_loss_step=0.193, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 34%|███▍ | 65/192 [00:20<00:40, 3.11it/s, v_num=jo_1, train_loss_step=0.262, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 36%|███▋ | 70/192 [00:22<00:39, 3.11it/s, v_num=jo_1, train_loss_step=0.262, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 36%|███▋ | 70/192 [00:22<00:39, 3.11it/s, v_num=jo_1, train_loss_step=0.0885, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 39%|███▉ | 75/192 [00:24<00:37, 3.12it/s, v_num=jo_1, train_loss_step=0.0885, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 39%|███▉ | 75/192 [00:24<00:37, 3.12it/s, v_num=jo_1, train_loss_step=4.540, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 42%|████▏ | 80/192 [00:25<00:35, 3.12it/s, v_num=jo_1, train_loss_step=4.540, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 42%|████▏ | 80/192 [00:25<00:35, 3.12it/s, v_num=jo_1, train_loss_step=0.826, train_rouge1_fmeasure_step=0.0168, train_rouge1_precision_step=0.200, train_rouge1_recall_step=0.00877, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0168, train_rougeL_precision_step=0.200, train_rougeL_recall_step=0.00877, train_rougeLsum_fmeasure_step=0.0168, train_rougeLsum_precision_step=0.200, train_rougeLsum_recall_step=0.00877]Epoch 0: 44%|████▍ | 85/192 [00:27<00:34, 3.12it/s, v_num=jo_1, train_loss_step=0.826, train_rouge1_fmeasure_step=0.0168, train_rouge1_precision_step=0.200, train_rouge1_recall_step=0.00877, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0168, train_rougeL_precision_step=0.200, train_rougeL_recall_step=0.00877, train_rougeLsum_fmeasure_step=0.0168, train_rougeLsum_precision_step=0.200, train_rougeLsum_recall_step=0.00877]Epoch 0: 44%|████▍ | 85/192 [00:27<00:34, 3.12it/s, v_num=jo_1, train_loss_step=0.316, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 47%|████▋ | 90/192 [00:28<00:32, 3.12it/s, v_num=jo_1, train_loss_step=0.316, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 47%|████▋ | 90/192 [00:28<00:32, 3.12it/s, v_num=jo_1, train_loss_step=3.980, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 49%|████▉ | 95/192 [00:30<00:31, 3.12it/s, v_num=jo_1, train_loss_step=3.980, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 49%|████▉ | 95/192 [00:30<00:31, 3.12it/s, v_num=jo_1, train_loss_step=0.119, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 52%|█████▏ | 100/192 [00:32<00:29, 3.12it/s, v_num=jo_1, train_loss_step=0.119, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 52%|█████▏ | 100/192 [00:32<00:29, 3.12it/s, v_num=jo_1, train_loss_step=0.684, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:33<00:27, 3.12it/s, v_num=jo_1, train_loss_step=0.684, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:33<00:27, 3.12it/s, v_num=jo_1, train_loss_step=0.134, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:35<00:26, 3.12it/s, v_num=jo_1, train_loss_step=0.134, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:35<00:26, 3.12it/s, v_num=jo_1, train_loss_step=0.123, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 60%|█████▉ | 115/192 [00:36<00:24, 3.12it/s, v_num=jo_1, train_loss_step=0.123, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 60%|█████▉ | 115/192 [00:36<00:24, 3.12it/s, v_num=jo_1, train_loss_step=3.160, train_rouge1_fmeasure_step=0.111, train_rouge1_precision_step=0.167, train_rouge1_recall_step=0.0833, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.111, train_rougeL_precision_step=0.167, train_rougeL_recall_step=0.0833, train_rougeLsum_fmeasure_step=0.111, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=0.0833]Epoch 0: 62%|██████▎ | 120/192 [00:38<00:23, 3.12it/s, v_num=jo_1, train_loss_step=3.160, train_rouge1_fmeasure_step=0.111, train_rouge1_precision_step=0.167, train_rouge1_recall_step=0.0833, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.111, train_rougeL_precision_step=0.167, train_rougeL_recall_step=0.0833, train_rougeLsum_fmeasure_step=0.111, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=0.0833]Epoch 0: 62%|██████▎ | 120/192 [00:38<00:23, 3.12it/s, v_num=jo_1, train_loss_step=0.132, train_rouge1_fmeasure_step=0.087, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.087, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.0526, train_rougeLsum_fmeasure_step=0.087, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.0526]Epoch 0: 65%|██████▌ | 125/192 [00:40<00:21, 3.12it/s, v_num=jo_1, train_loss_step=0.132, train_rouge1_fmeasure_step=0.087, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.087, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.0526, train_rougeLsum_fmeasure_step=0.087, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.0526]Epoch 0: 65%|██████▌ | 125/192 [00:40<00:21, 3.12it/s, v_num=jo_1, train_loss_step=0.105, train_rouge1_fmeasure_step=0.125, train_rouge1_precision_step=0.167, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.125, train_rougeL_precision_step=0.167, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.125, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=0.100] Epoch 0: 68%|██████▊ | 130/192 [00:41<00:19, 3.12it/s, v_num=jo_1, train_loss_step=0.105, train_rouge1_fmeasure_step=0.125, train_rouge1_precision_step=0.167, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.125, train_rougeL_precision_step=0.167, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.125, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=0.100]Epoch 0: 68%|██████▊ | 130/192 [00:41<00:19, 3.12it/s, v_num=jo_1, train_loss_step=1.050, train_rouge1_fmeasure_step=0.0253, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.0132, train_rouge2_fmeasure_step=0.0128, train_rouge2_precision_step=0.200, train_rouge2_recall_step=0.00662, train_rougeL_fmeasure_step=0.0253, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.0132, train_rougeLsum_fmeasure_step=0.0253, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.0132]Epoch 0: 70%|███████ | 135/192 [00:43<00:18, 3.13it/s, v_num=jo_1, train_loss_step=1.050, train_rouge1_fmeasure_step=0.0253, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.0132, train_rouge2_fmeasure_step=0.0128, train_rouge2_precision_step=0.200, train_rouge2_recall_step=0.00662, train_rougeL_fmeasure_step=0.0253, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.0132, train_rougeLsum_fmeasure_step=0.0253, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.0132]Epoch 0: 70%|███████ | 135/192 [00:43<00:18, 3.13it/s, v_num=jo_1, train_loss_step=1.100, train_rouge1_fmeasure_step=0.400, train_rouge1_precision_step=0.600, train_rouge1_recall_step=0.300, train_rouge2_fmeasure_step=0.308, train_rouge2_precision_step=0.500, train_rouge2_recall_step=0.222, train_rougeL_fmeasure_step=0.400, train_rougeL_precision_step=0.600, train_rougeL_recall_step=0.300, train_rougeLsum_fmeasure_step=0.400, train_rougeLsum_precision_step=0.600, train_rougeLsum_recall_step=0.300] Epoch 0: 73%|███████▎ | 140/192 [00:44<00:16, 3.13it/s, v_num=jo_1, train_loss_step=1.100, train_rouge1_fmeasure_step=0.400, train_rouge1_precision_step=0.600, train_rouge1_recall_step=0.300, train_rouge2_fmeasure_step=0.308, train_rouge2_precision_step=0.500, train_rouge2_recall_step=0.222, train_rougeL_fmeasure_step=0.400, train_rougeL_precision_step=0.600, train_rougeL_recall_step=0.300, train_rougeLsum_fmeasure_step=0.400, train_rougeLsum_precision_step=0.600, train_rougeLsum_recall_step=0.300]Epoch 0: 73%|███████▎ | 140/192 [00:44<00:16, 3.13it/s, v_num=jo_1, train_loss_step=2.410, train_rouge1_fmeasure_step=0.333, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.333, train_rouge2_fmeasure_step=0.200, train_rouge2_precision_step=0.200, train_rouge2_recall_step=0.200, train_rougeL_fmeasure_step=0.333, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.333, train_rougeLsum_fmeasure_step=0.333, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.333]Epoch 0: 76%|███████▌ | 145/192 [00:46<00:15, 3.13it/s, v_num=jo_1, train_loss_step=2.410, train_rouge1_fmeasure_step=0.333, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.333, train_rouge2_fmeasure_step=0.200, train_rouge2_precision_step=0.200, train_rouge2_recall_step=0.200, train_rougeL_fmeasure_step=0.333, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.333, train_rougeLsum_fmeasure_step=0.333, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.333]Epoch 0: 76%|███████▌ | 145/192 [00:46<00:15, 3.13it/s, v_num=jo_1, train_loss_step=0.0764, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=0.400, train_rouge1_recall_step=0.222, train_rouge2_fmeasure_step=0.167, train_rouge2_precision_step=0.250, train_rouge2_recall_step=0.125, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=0.400, train_rougeL_recall_step=0.222, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=0.400, train_rougeLsum_recall_step=0.222]Epoch 0: 78%|███████▊ | 150/192 [00:47<00:13, 3.13it/s, v_num=jo_1, train_loss_step=0.0764, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=0.400, train_rouge1_recall_step=0.222, train_rouge2_fmeasure_step=0.167, train_rouge2_precision_step=0.250, train_rouge2_recall_step=0.125, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=0.400, train_rougeL_recall_step=0.222, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=0.400, train_rougeLsum_recall_step=0.222]Epoch 0: 78%|███████▊ | 150/192 [00:47<00:13, 3.13it/s, v_num=jo_1, train_loss_step=1.410, train_rouge1_fmeasure_step=0.111, train_rouge1_precision_step=0.111, train_rouge1_recall_step=0.111, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.111, train_rougeL_precision_step=0.111, train_rougeL_recall_step=0.111, train_rougeLsum_fmeasure_step=0.111, train_rougeLsum_precision_step=0.111, train_rougeLsum_recall_step=0.111] Epoch 0: 81%|████████ | 155/192 [00:49<00:11, 3.13it/s, v_num=jo_1, train_loss_step=1.410, train_rouge1_fmeasure_step=0.111, train_rouge1_precision_step=0.111, train_rouge1_recall_step=0.111, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.111, train_rougeL_precision_step=0.111, train_rougeL_recall_step=0.111, train_rougeLsum_fmeasure_step=0.111, train_rougeLsum_precision_step=0.111, train_rougeLsum_recall_step=0.111]Epoch 0: 81%|████████ | 155/192 [00:49<00:11, 3.13it/s, v_num=jo_1, train_loss_step=0.629, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 83%|████████▎ | 160/192 [00:51<00:10, 3.13it/s, v_num=jo_1, train_loss_step=0.629, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 83%|████████▎ | 160/192 [00:51<00:10, 3.13it/s, v_num=jo_1, train_loss_step=0.340, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600]Epoch 0: 86%|████████▌ | 165/192 [00:52<00:08, 3.13it/s, v_num=jo_1, train_loss_step=0.340, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600]Epoch 0: 86%|████████▌ | 165/192 [00:52<00:08, 3.13it/s, v_num=jo_1, train_loss_step=1.780, train_rouge1_fmeasure_step=0.182, train_rouge1_precision_step=0.200, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.182, train_rougeL_precision_step=0.200, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.182, train_rougeLsum_precision_step=0.200, train_rougeLsum_recall_step=0.167]Epoch 0: 89%|████████▊ | 170/192 [00:54<00:07, 3.13it/s, v_num=jo_1, train_loss_step=1.780, train_rouge1_fmeasure_step=0.182, train_rouge1_precision_step=0.200, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.182, train_rougeL_precision_step=0.200, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.182, train_rougeLsum_precision_step=0.200, train_rougeLsum_recall_step=0.167]Epoch 0: 89%|████████▊ | 170/192 [00:54<00:07, 3.13it/s, v_num=jo_1, train_loss_step=0.167, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 91%|█████████ | 175/192 [00:55<00:05, 3.13it/s, v_num=jo_1, train_loss_step=0.167, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 91%|█████████ | 175/192 [00:55<00:05, 3.13it/s, v_num=jo_1, train_loss_step=0.162, train_rouge1_fmeasure_step=0.333, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.200, train_rouge2_fmeasure_step=0.250, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.143, train_rougeL_fmeasure_step=0.333, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.200, train_rougeLsum_fmeasure_step=0.333, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.200]Epoch 0: 94%|█████████▍| 180/192 [00:57<00:03, 3.13it/s, v_num=jo_1, train_loss_step=0.162, train_rouge1_fmeasure_step=0.333, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.200, train_rouge2_fmeasure_step=0.250, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.143, train_rougeL_fmeasure_step=0.333, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.200, train_rougeLsum_fmeasure_step=0.333, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.200]Epoch 0: 94%|█████████▍| 180/192 [00:57<00:03, 3.13it/s, v_num=jo_1, train_loss_step=0.352, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.667, train_rouge1_recall_step=0.400, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.667, train_rougeL_recall_step=0.400, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.667, train_rougeLsum_recall_step=0.400]Epoch 0: 96%|█████████▋| 185/192 [00:59<00:02, 3.13it/s, v_num=jo_1, train_loss_step=0.352, train_rouge1_fmeasure_step=0.500, train_rouge1_precision_step=0.667, train_rouge1_recall_step=0.400, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.500, train_rougeL_precision_step=0.667, train_rougeL_recall_step=0.400, train_rougeLsum_fmeasure_step=0.500, train_rougeLsum_precision_step=0.667, train_rougeLsum_recall_step=0.400]Epoch 0: 96%|█████████▋| 185/192 [00:59<00:02, 3.13it/s, v_num=jo_1, train_loss_step=6.670, train_rouge1_fmeasure_step=0.545, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.375, train_rouge2_fmeasure_step=0.444, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.286, train_rougeL_fmeasure_step=0.545, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.375, train_rougeLsum_fmeasure_step=0.545, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.375]Epoch 0: 99%|█████████▉| 190/192 [01:00<00:00, 3.13it/s, v_num=jo_1, train_loss_step=6.670, train_rouge1_fmeasure_step=0.545, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.375, train_rouge2_fmeasure_step=0.444, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.286, train_rougeL_fmeasure_step=0.545, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.375, train_rougeLsum_fmeasure_step=0.545, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.375]Epoch 0: 99%|█████████▉| 190/192 [01:00<00:00, 3.13it/s, v_num=jo_1, train_loss_step=0.370, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 100%|██████████| 192/192 [01:01<00:00, 3.13it/s, v_num=jo_1, train_loss_step=0.370, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 100%|██████████| 192/192 [01:01<00:00, 3.13it/s, v_num=jo_1, train_loss_step=0.454, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600]
Validation: | | 0/? [00:00<?, ?it/s][A
Validation: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 42%|████▏ | 5/12 [00:00<00:01, 5.42it/s][A
Validation DataLoader 0: 83%|████████▎ | 10/12 [00:01<00:00, 5.38it/s][A
Validation DataLoader 0: 100%|██████████| 12/12 [00:02<00:00, 5.36it/s][A
[AEpoch 0: 100%|██████████| 192/192 [01:03<00:00, 3.02it/s, v_num=jo_1, train_loss_step=0.454, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600, val_loss_step=5.190, val_rouge1_fmeasure_step=0.488, val_rouge1_precision_step=1.000, val_rouge1_recall_step=0.330, val_rouge2_fmeasure_step=0.393, val_rouge2_precision_step=1.000, val_rouge2_recall_step=0.250, val_rougeL_fmeasure_step=0.488, val_rougeL_precision_step=1.000, val_rougeL_recall_step=0.330, val_rougeLsum_fmeasure_step=0.488, val_rougeLsum_precision_step=1.000, val_rougeLsum_recall_step=0.330, val_loss_epoch=0.967, val_rouge1_fmeasure_epoch=0.440, val_rouge1_precision_epoch=0.929, val_rouge1_recall_epoch=0.322, val_rouge2_fmeasure_epoch=0.363, val_rouge2_precision_epoch=0.882, val_rouge2_recall_epoch=0.253, val_rougeL_fmeasure_epoch=0.440, val_rougeL_precision_epoch=0.929, val_rougeL_recall_epoch=0.322, val_rougeLsum_fmeasure_epoch=0.440, val_rougeLsum_precision_epoch=0.929, val_rougeLsum_recall_epoch=0.322]Epoch 0: 100%|██████████| 192/192 [01:03<00:00, 3.02it/s, v_num=jo_1, train_loss_step=0.454, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600, val_loss_step=5.190, val_rouge1_fmeasure_step=0.488, val_rouge1_precision_step=1.000, val_rouge1_recall_step=0.330, val_rouge2_fmeasure_step=0.393, val_rouge2_precision_step=1.000, val_rouge2_recall_step=0.250, val_rougeL_fmeasure_step=0.488, val_rougeL_precision_step=1.000, val_rougeL_recall_step=0.330, val_rougeLsum_fmeasure_step=0.488, val_rougeLsum_precision_step=1.000, val_rougeLsum_recall_step=0.330, val_loss_epoch=0.967, val_rouge1_fmeasure_epoch=0.440, val_rouge1_precision_epoch=0.929, val_rouge1_recall_epoch=0.322, val_rouge2_fmeasure_epoch=0.363, val_rouge2_precision_epoch=0.882, val_rouge2_recall_epoch=0.253, val_rougeL_fmeasure_epoch=0.440, val_rougeL_precision_epoch=0.929, val_rougeL_recall_epoch=0.322, val_rougeLsum_fmeasure_epoch=0.440, val_rougeLsum_precision_epoch=0.929, val_rougeLsum_recall_epoch=0.322, train_loss_epoch=1.170, train_rouge1_fmeasure_epoch=0.132, train_rouge1_precision_epoch=0.219, train_rouge1_recall_epoch=0.124, train_rouge2_fmeasure_epoch=0.0815, train_rouge2_precision_epoch=0.142, train_rouge2_recall_epoch=0.064, train_rougeL_fmeasure_epoch=0.131, train_rougeL_precision_epoch=0.218, train_rougeL_recall_epoch=0.122, train_rougeLsum_fmeasure_epoch=0.131, train_rougeLsum_precision_epoch=0.219, train_rougeLsum_recall_epoch=0.122]`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|██████████| 192/192 [01:27<00:00, 2.19it/s, v_num=jo_1, train_loss_step=0.454, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600, val_loss_step=5.190, val_rouge1_fmeasure_step=0.488, val_rouge1_precision_step=1.000, val_rouge1_recall_step=0.330, val_rouge2_fmeasure_step=0.393, val_rouge2_precision_step=1.000, val_rouge2_recall_step=0.250, val_rougeL_fmeasure_step=0.488, val_rougeL_precision_step=1.000, val_rougeL_recall_step=0.330, val_rougeLsum_fmeasure_step=0.488, val_rougeLsum_precision_step=1.000, val_rougeLsum_recall_step=0.330, val_loss_epoch=0.967, val_rouge1_fmeasure_epoch=0.440, val_rouge1_precision_epoch=0.929, val_rouge1_recall_epoch=0.322, val_rouge2_fmeasure_epoch=0.363, val_rouge2_precision_epoch=0.882, val_rouge2_recall_epoch=0.253, val_rougeL_fmeasure_epoch=0.440, val_rougeL_precision_epoch=0.929, val_rougeL_recall_epoch=0.322, val_rougeLsum_fmeasure_epoch=0.440, val_rougeLsum_precision_epoch=0.929, val_rougeLsum_recall_epoch=0.322, train_loss_epoch=1.170, train_rouge1_fmeasure_epoch=0.132, train_rouge1_precision_epoch=0.219, train_rouge1_recall_epoch=0.124, train_rouge2_fmeasure_epoch=0.0815, train_rouge2_precision_epoch=0.142, train_rouge2_recall_epoch=0.064, train_rougeL_fmeasure_epoch=0.131, train_rougeL_precision_epoch=0.218, train_rougeL_recall_epoch=0.122, train_rougeLsum_fmeasure_epoch=0.131, train_rougeLsum_precision_epoch=0.219, train_rougeLsum_recall_epoch=0.122]wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.014 MB uploadedwandb: / 0.014 MB of 0.026 MB uploadedwandb: - 0.014 MB of 0.026 MB uploadedwandb: \ 0.026 MB of 0.026 MB uploadedwandb:
wandb:
wandb: Run history:
wandb: epoch ▁▁▁
wandb: lr-PagedLion ▁▁
wandb: train_loss_epoch ▁
wandb: train_loss_step ▁
wandb: train_rouge1_fmeasure_epoch ▁
wandb: train_rouge1_fmeasure_step ▁
wandb: train_rouge1_precision_epoch ▁
wandb: train_rouge1_precision_step ▁
wandb: train_rouge1_recall_epoch ▁
wandb: train_rouge1_recall_step ▁
wandb: train_rouge2_fmeasure_epoch ▁
wandb: train_rouge2_fmeasure_step ▁
wandb: train_rouge2_precision_epoch ▁
wandb: train_rouge2_precision_step ▁
wandb: train_rouge2_recall_epoch ▁
wandb: train_rouge2_recall_step ▁
wandb: train_rougeL_fmeasure_epoch ▁
wandb: train_rougeL_fmeasure_step ▁
wandb: train_rougeL_precision_epoch ▁
wandb: train_rougeL_precision_step ▁
wandb: train_rougeL_recall_epoch ▁
wandb: train_rougeL_recall_step ▁
wandb: train_rougeLsum_fmeasure_epoch ▁
wandb: train_rougeLsum_fmeasure_step ▁
wandb: train_rougeLsum_precision_epoch ▁
wandb: train_rougeLsum_precision_step ▁
wandb: train_rougeLsum_recall_epoch ▁
wandb: train_rougeLsum_recall_step ▁
wandb: trainer/global_step ▅▅▅▁▁▁▁▁▁▁▂▂▂▂▂██
wandb: val_loss_epoch ▁
wandb: val_loss_step ▂▁▁▁▁▃▃▁▂▂▂█
wandb: val_rouge1_fmeasure_epoch ▁
wandb: val_rouge1_fmeasure_step ▄▆▃▃▃▁▆█▆▇▇▅
wandb: val_rouge1_precision_epoch ▁
wandb: val_rouge1_precision_step ██▄▄█▁▄█▄▄██
wandb: val_rouge1_recall_epoch ▁
wandb: val_rouge1_recall_step ▃▅▂▃▃▁▆█▆▇▇▅
wandb: val_rouge2_fmeasure_epoch ▁
wandb: val_rouge2_fmeasure_step ▃▅▃▃▃▁▆█▆▇▇▅
wandb: val_rouge2_precision_epoch ▁
wandb: val_rouge2_precision_step ▅█▅▅█▁▅█▅▅██
wandb: val_rouge2_recall_epoch ▁
wandb: val_rouge2_recall_step ▃▅▂▂▂▁▆█▆▇▇▄
wandb: val_rougeL_fmeasure_epoch ▁
wandb: val_rougeL_fmeasure_step ▄▆▃▃▃▁▆█▆▇▇▅
wandb: val_rougeL_precision_epoch ▁
wandb: val_rougeL_precision_step ██▄▄█▁▄█▄▄██
wandb: val_rougeL_recall_epoch ▁
wandb: val_rougeL_recall_step ▃▅▂▃▃▁▆█▆▇▇▅
wandb: val_rougeLsum_fmeasure_epoch ▁
wandb: val_rougeLsum_fmeasure_step ▄▆▃▃▃▁▆█▆▇▇▅
wandb: val_rougeLsum_precision_epoch ▁
wandb: val_rougeLsum_precision_step ██▄▄█▁▄█▄▄██
wandb: val_rougeLsum_recall_epoch ▁
wandb: val_rougeLsum_recall_step ▃▅▂▃▃▁▆█▆▇▇▅
wandb:
wandb: Run summary:
wandb: epoch 0
wandb: lr-PagedLion 0.00024
wandb: train_loss_epoch 1.17233
wandb: train_loss_step 0.68395
wandb: train_rouge1_fmeasure_epoch 0.13158
wandb: train_rouge1_fmeasure_step 0.0
wandb: train_rouge1_precision_epoch 0.21929
wandb: train_rouge1_precision_step 0.0
wandb: train_rouge1_recall_epoch 0.12398
wandb: train_rouge1_recall_step 0.0
wandb: train_rouge2_fmeasure_epoch 0.0815
wandb: train_rouge2_fmeasure_step 0.0
wandb: train_rouge2_precision_epoch 0.1422
wandb: train_rouge2_precision_step 0.0
wandb: train_rouge2_recall_epoch 0.06396
wandb: train_rouge2_recall_step 0.0
wandb: train_rougeL_fmeasure_epoch 0.13104
wandb: train_rougeL_fmeasure_step 0.0
wandb: train_rougeL_precision_epoch 0.21798
wandb: train_rougeL_precision_step 0.0
wandb: train_rougeL_recall_epoch 0.12244
wandb: train_rougeL_recall_step 0.0
wandb: train_rougeLsum_fmeasure_epoch 0.13147
wandb: train_rougeLsum_fmeasure_step 0.0
wandb: train_rougeLsum_precision_epoch 0.21922
wandb: train_rougeLsum_precision_step 0.0
wandb: train_rougeLsum_recall_epoch 0.12245
wandb: train_rougeLsum_recall_step 0.0
wandb: trainer/global_step 95
wandb: val_loss_epoch 0.9669
wandb: val_loss_step 5.18546
wandb: val_rouge1_fmeasure_epoch 0.44038
wandb: val_rouge1_fmeasure_step 0.4875
wandb: val_rouge1_precision_epoch 0.92917
wandb: val_rouge1_precision_step 1.0
wandb: val_rouge1_recall_epoch 0.3218
wandb: val_rouge1_recall_step 0.32967
wandb: val_rouge2_fmeasure_epoch 0.36287
wandb: val_rouge2_fmeasure_step 0.39286
wandb: val_rouge2_precision_epoch 0.88194
wandb: val_rouge2_precision_step 1.0
wandb: val_rouge2_recall_epoch 0.2528
wandb: val_rouge2_recall_step 0.25
wandb: val_rougeL_fmeasure_epoch 0.44038
wandb: val_rougeL_fmeasure_step 0.4875
wandb: val_rougeL_precision_epoch 0.92917
wandb: val_rougeL_precision_step 1.0
wandb: val_rougeL_recall_epoch 0.3218
wandb: val_rougeL_recall_step 0.32967
wandb: val_rougeLsum_fmeasure_epoch 0.44038
wandb: val_rougeLsum_fmeasure_step 0.4875
wandb: val_rougeLsum_precision_epoch 0.92917
wandb: val_rougeLsum_precision_step 1.0
wandb: val_rougeLsum_recall_epoch 0.3218
wandb: val_rougeLsum_recall_step 0.32967
wandb:
wandb: 🚀 View run divine-sweep-4 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/dky0jtjo
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_135652-dky0jtjo/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
wandb: Agent Starting Run: gc9hmicj with config:
wandb: accumulate_grad_batches: 6
wandb: epochs: 1
wandb: gradient_clip_val: 0.5238534567989753
wandb: init_lora_weights: False
wandb: lora_alpha: 16
wandb: lora_dropout: 0.07289582504867571
wandb: lora_rank: 8
wandb: lr: 0.00021450017137806743
wandb: model_name: jjhsnail0822/danube-ko-1.8b-base
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_140140-gc9hmicj
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run absurd-sweep-5
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/gc9hmicj
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Map: 0%| | 0/960 [00:00<?, ? examples/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Map: 100%|██████████| 960/960 [00:00<00:00, 7789.59 examples/s]Map: 100%|██████████| 960/960 [00:00<00:00, 6555.33 examples/s]
Map: 0%| | 0/960 [00:00<?, ? examples/s]Map: 0%| | 0/960 [00:00<?, ? examples/s]
wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.020 MB uploadedwandb: / 0.014 MB of 0.020 MB uploadedwandb: - 0.020 MB of 0.020 MB uploadedwandb:
wandb: 🚀 View run absurd-sweep-5 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/gc9hmicj
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_140140-gc9hmicj/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
Run gc9hmicj errored:
Traceback (most recent call last):
File "/user/jonathan/jonathan/lib/python3.10/site-packages/wandb/agents/pyagent.py", line 307, in _run_job
self._function()
File "/user/jonathan/l_sweep.py", line 50, in l2ray_trainer
dataset = get_dataset(dataset_name=dataset_path, tokenizer=tokenizer)
File "/user/jonathan/finetuning_datasets.py", line 40, in get_dataset
en_dataset = raw_dataset.map(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
{
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
k: dataset.map(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3161, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3552, in _map_single
batch = apply_function_on_filtered_inputs(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/user/jonathan/finetuning_datafunctions.py", line 65, in preprocess_function
templated_text, labels = formatting(sample=sample,
File "/user/jonathan/finetuning_datafunctions.py", line 38, in formatting
bot_message = tokenizer.apply_chat_template(conversation=label_template,
File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1855, in apply_chat_template
rendered_chat = compiled_template.render(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
self.environment.handle_exception()
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 1, in top-level template code
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
return __context.call(__obj, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1899, in raise_exception
raise TemplateError(message)
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
wandb: ERROR Run gc9hmicj errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/wandb/agents/pyagent.py", line 307, in _run_job
wandb: ERROR self._function()
wandb: ERROR File "/user/jonathan/l_sweep.py", line 50, in l2ray_trainer
wandb: ERROR dataset = get_dataset(dataset_name=dataset_path, tokenizer=tokenizer)
wandb: ERROR File "/user/jonathan/finetuning_datasets.py", line 40, in get_dataset
wandb: ERROR en_dataset = raw_dataset.map(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
wandb: ERROR {
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
wandb: ERROR k: dataset.map(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
wandb: ERROR out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
wandb: ERROR out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3161, in map
wandb: ERROR for rank, done, content in Dataset._map_single(**dataset_kwargs):
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3552, in _map_single
wandb: ERROR batch = apply_function_on_filtered_inputs(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
wandb: ERROR processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
wandb: ERROR File "/user/jonathan/finetuning_datafunctions.py", line 65, in preprocess_function
wandb: ERROR templated_text, labels = formatting(sample=sample,
wandb: ERROR File "/user/jonathan/finetuning_datafunctions.py", line 38, in formatting
wandb: ERROR bot_message = tokenizer.apply_chat_template(conversation=label_template,
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1855, in apply_chat_template
wandb: ERROR rendered_chat = compiled_template.render(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
wandb: ERROR self.environment.handle_exception()
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
wandb: ERROR raise rewrite_traceback_stack(source=source)
wandb: ERROR File "<template>", line 1, in top-level template code
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
wandb: ERROR return __context.call(__obj, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1899, in raise_exception
wandb: ERROR raise TemplateError(message)
wandb: ERROR jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
wandb: ERROR
wandb: Agent Starting Run: iz24uc9a with config:
wandb: accumulate_grad_batches: 1
wandb: epochs: 1
wandb: gradient_clip_val: 0.4449990271546663
wandb: init_lora_weights: pissa_niter_16
wandb: lora_alpha: 128
wandb: lora_dropout: 0.06251410191599312
wandb: lora_rank: 8
wandb: lr: 0.00022508829909255213
wandb: model_name: EleutherAI/polyglot-ko-12.8b
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: - Waiting for wandb.init()...wandb: \ Waiting for wandb.init()...wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_140206-iz24uc9a
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run chocolate-sweep-6
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/iz24uc9a
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Map: 0%| | 0/960 [00:00<?, ? examples/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Map: 100%|██████████| 960/960 [00:00<00:00, 6642.05 examples/s]Map: 100%|██████████| 960/960 [00:00<00:00, 5915.19 examples/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Enabling DeepSpeed BF16. Model parameters and inputs will be cast to `bfloat16`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[2024-08-06 14:05:09,952] [WARNING] [engine.py:1179:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
| Name | Type | Params | Mode
-------------------------------------------------------
0 | model | PeftModelForCausalLM | 1.5 B | train
-------------------------------------------------------
1.1 M Trainable params
1.5 B Non-trainable params
1.5 B Total params
6,183.574 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]Sanity Checking: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 2.10it/s] Training: | | 0/? [00:00<?, ?it/s]Training: 0%| | 0/192 [00:00<?, ?it/s]Epoch 0: 0%| | 0/192 [00:00<?, ?it/s] Epoch 0: 3%|▎ | 5/192 [00:02<01:42, 1.82it/s]Epoch 0: 3%|▎ | 5/192 [00:02<01:42, 1.82it/s, v_num=9a_1, train_loss_step=3.600, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 5%|▌ | 10/192 [00:05<01:39, 1.83it/s, v_num=9a_1, train_loss_step=3.600, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 5%|▌ | 10/192 [00:05<01:39, 1.82it/s, v_num=9a_1, train_loss_step=1.520, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 8%|▊ | 15/192 [00:08<01:36, 1.83it/s, v_num=9a_1, train_loss_step=1.520, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 8%|▊ | 15/192 [00:08<01:36, 1.83it/s, v_num=9a_1, train_loss_step=0.391, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 10%|█ | 20/192 [00:10<01:33, 1.84it/s, v_num=9a_1, train_loss_step=0.391, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 10%|█ | 20/192 [00:10<01:33, 1.84it/s, v_num=9a_1, train_loss_step=0.172, train_rouge1_fmeasure_step=0.118, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.0714, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.118, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.0714, train_rougeLsum_fmeasure_step=0.118, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.0714]Epoch 0: 13%|█▎ | 25/192 [00:13<01:30, 1.84it/s, v_num=9a_1, train_loss_step=0.172, train_rouge1_fmeasure_step=0.118, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.0714, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.118, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.0714, train_rougeLsum_fmeasure_step=0.118, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.0714]Epoch 0: 13%|█▎ | 25/192 [00:13<01:30, 1.84it/s, v_num=9a_1, train_loss_step=0.135, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 16%|█▌ | 30/192 [00:16<01:27, 1.84it/s, v_num=9a_1, train_loss_step=0.135, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 16%|█▌ | 30/192 [00:16<01:27, 1.84it/s, v_num=9a_1, train_loss_step=0.0721, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:19<01:26, 1.81it/s, v_num=9a_1, train_loss_step=0.0721, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 18%|█▊ | 35/192 [00:19<01:26, 1.81it/s, v_num=9a_1, train_loss_step=1.960, train_rouge1_fmeasure_step=0.0132, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.00662, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0132, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.00662, train_rougeLsum_fmeasure_step=0.0132, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.00662]Epoch 0: 21%|██ | 40/192 [00:22<01:23, 1.81it/s, v_num=9a_1, train_loss_step=1.960, train_rouge1_fmeasure_step=0.0132, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.00662, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0132, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.00662, train_rougeLsum_fmeasure_step=0.0132, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.00662]Epoch 0: 21%|██ | 40/192 [00:22<01:23, 1.81it/s, v_num=9a_1, train_loss_step=0.382, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000] Epoch 0: 23%|██▎ | 45/192 [00:24<01:20, 1.82it/s, v_num=9a_1, train_loss_step=0.382, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 23%|██▎ | 45/192 [00:24<01:20, 1.82it/s, v_num=9a_1, train_loss_step=0.120, train_rouge1_fmeasure_step=0.250, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.200, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.250, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.200, train_rougeLsum_fmeasure_step=0.250, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.200]Epoch 0: 26%|██▌ | 50/192 [00:27<01:17, 1.82it/s, v_num=9a_1, train_loss_step=0.120, train_rouge1_fmeasure_step=0.250, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.200, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.250, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.200, train_rougeLsum_fmeasure_step=0.250, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.200]Epoch 0: 26%|██▌ | 50/192 [00:27<01:17, 1.82it/s, v_num=9a_1, train_loss_step=0.198, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:30<01:15, 1.83it/s, v_num=9a_1, train_loss_step=0.198, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 29%|██▊ | 55/192 [00:30<01:15, 1.83it/s, v_num=9a_1, train_loss_step=0.305, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:32<01:12, 1.83it/s, v_num=9a_1, train_loss_step=0.305, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 31%|███▏ | 60/192 [00:32<01:12, 1.83it/s, v_num=9a_1, train_loss_step=0.190, train_rouge1_fmeasure_step=0.0769, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.0435, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0769, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.0435, train_rougeLsum_fmeasure_step=0.0769, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.0435]Epoch 0: 34%|███▍ | 65/192 [00:35<01:09, 1.83it/s, v_num=9a_1, train_loss_step=0.190, train_rouge1_fmeasure_step=0.0769, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.0435, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0769, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.0435, train_rougeLsum_fmeasure_step=0.0769, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.0435]Epoch 0: 34%|███▍ | 65/192 [00:35<01:09, 1.83it/s, v_num=9a_1, train_loss_step=0.252, train_rouge1_fmeasure_step=0.0667, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.0345, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0667, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.0345, train_rougeLsum_fmeasure_step=0.0667, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.0345]Epoch 0: 36%|███▋ | 70/192 [00:38<01:06, 1.83it/s, v_num=9a_1, train_loss_step=0.252, train_rouge1_fmeasure_step=0.0667, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.0345, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0667, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.0345, train_rougeLsum_fmeasure_step=0.0667, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.0345]Epoch 0: 36%|███▋ | 70/192 [00:38<01:06, 1.83it/s, v_num=9a_1, train_loss_step=0.0595, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.167] Epoch 0: 39%|███▉ | 75/192 [00:40<01:03, 1.84it/s, v_num=9a_1, train_loss_step=0.0595, train_rouge1_fmeasure_step=0.286, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.286, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.286, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.167]Epoch 0: 39%|███▉ | 75/192 [00:40<01:03, 1.84it/s, v_num=9a_1, train_loss_step=0.660, train_rouge1_fmeasure_step=0.250, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.250, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.250, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.167] Epoch 0: 42%|████▏ | 80/192 [00:43<01:00, 1.84it/s, v_num=9a_1, train_loss_step=0.660, train_rouge1_fmeasure_step=0.250, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.250, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.250, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.167]Epoch 0: 42%|████▏ | 80/192 [00:43<01:00, 1.84it/s, v_num=9a_1, train_loss_step=1.130, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 44%|████▍ | 85/192 [00:46<00:58, 1.84it/s, v_num=9a_1, train_loss_step=1.130, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 44%|████▍ | 85/192 [00:46<00:58, 1.84it/s, v_num=9a_1, train_loss_step=0.373, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 47%|████▋ | 90/192 [00:48<00:55, 1.84it/s, v_num=9a_1, train_loss_step=0.373, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 47%|████▋ | 90/192 [00:48<00:55, 1.84it/s, v_num=9a_1, train_loss_step=0.493, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 49%|████▉ | 95/192 [00:51<00:52, 1.84it/s, v_num=9a_1, train_loss_step=0.493, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 49%|████▉ | 95/192 [00:51<00:52, 1.84it/s, v_num=9a_1, train_loss_step=0.0936, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 52%|█████▏ | 100/192 [00:54<00:49, 1.84it/s, v_num=9a_1, train_loss_step=0.0936, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 52%|█████▏ | 100/192 [00:54<00:49, 1.84it/s, v_num=9a_1, train_loss_step=0.0976, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:57<00:47, 1.84it/s, v_num=9a_1, train_loss_step=0.0976, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 55%|█████▍ | 105/192 [00:57<00:47, 1.84it/s, v_num=9a_1, train_loss_step=0.0899, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:59<00:44, 1.84it/s, v_num=9a_1, train_loss_step=0.0899, train_rouge1_fmeasure_step=0.000, train_rouge1_precision_step=0.000, train_rouge1_recall_step=0.000, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.000, train_rougeL_precision_step=0.000, train_rougeL_recall_step=0.000, train_rougeLsum_fmeasure_step=0.000, train_rougeLsum_precision_step=0.000, train_rougeLsum_recall_step=0.000]Epoch 0: 57%|█████▋ | 110/192 [00:59<00:44, 1.84it/s, v_num=9a_1, train_loss_step=0.110, train_rouge1_fmeasure_step=0.235, train_rouge1_precision_step=0.400, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.235, train_rougeL_precision_step=0.400, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.235, train_rougeLsum_precision_step=0.400, train_rougeLsum_recall_step=0.167] Epoch 0: 60%|█████▉ | 115/192 [01:02<00:41, 1.84it/s, v_num=9a_1, train_loss_step=0.110, train_rouge1_fmeasure_step=0.235, train_rouge1_precision_step=0.400, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.235, train_rougeL_precision_step=0.400, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.235, train_rougeLsum_precision_step=0.400, train_rougeLsum_recall_step=0.167]Epoch 0: 60%|█████▉ | 115/192 [01:02<00:41, 1.84it/s, v_num=9a_1, train_loss_step=0.433, train_rouge1_fmeasure_step=0.100, train_rouge1_precision_step=0.125, train_rouge1_recall_step=0.0833, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.100, train_rougeL_precision_step=0.125, train_rougeL_recall_step=0.0833, train_rougeLsum_fmeasure_step=0.100, train_rougeLsum_precision_step=0.125, train_rougeLsum_recall_step=0.0833]Epoch 0: 62%|██████▎ | 120/192 [01:05<00:39, 1.84it/s, v_num=9a_1, train_loss_step=0.433, train_rouge1_fmeasure_step=0.100, train_rouge1_precision_step=0.125, train_rouge1_recall_step=0.0833, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.100, train_rougeL_precision_step=0.125, train_rougeL_recall_step=0.0833, train_rougeLsum_fmeasure_step=0.100, train_rougeLsum_precision_step=0.125, train_rougeLsum_recall_step=0.0833]Epoch 0: 62%|██████▎ | 120/192 [01:05<00:39, 1.84it/s, v_num=9a_1, train_loss_step=0.141, train_rouge1_fmeasure_step=0.087, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.087, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.0526, train_rougeLsum_fmeasure_step=0.087, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.0526]Epoch 0: 65%|██████▌ | 125/192 [01:07<00:36, 1.84it/s, v_num=9a_1, train_loss_step=0.141, train_rouge1_fmeasure_step=0.087, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.0526, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.087, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.0526, train_rougeLsum_fmeasure_step=0.087, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.0526]Epoch 0: 65%|██████▌ | 125/192 [01:07<00:36, 1.84it/s, v_num=9a_1, train_loss_step=0.0931, train_rouge1_fmeasure_step=0.111, train_rouge1_precision_step=0.125, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.111, train_rougeL_precision_step=0.125, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.111, train_rougeLsum_precision_step=0.125, train_rougeLsum_recall_step=0.100] Epoch 0: 68%|██████▊ | 130/192 [01:10<00:33, 1.84it/s, v_num=9a_1, train_loss_step=0.0931, train_rouge1_fmeasure_step=0.111, train_rouge1_precision_step=0.125, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.111, train_rougeL_precision_step=0.125, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.111, train_rougeLsum_precision_step=0.125, train_rougeLsum_recall_step=0.100]Epoch 0: 68%|██████▊ | 130/192 [01:10<00:33, 1.84it/s, v_num=9a_1, train_loss_step=1.360, train_rouge1_fmeasure_step=0.0129, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.00658, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0129, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.00658, train_rougeLsum_fmeasure_step=0.0129, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.00658]Epoch 0: 70%|███████ | 135/192 [01:13<00:30, 1.85it/s, v_num=9a_1, train_loss_step=1.360, train_rouge1_fmeasure_step=0.0129, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.00658, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.0129, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.00658, train_rougeLsum_fmeasure_step=0.0129, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.00658]Epoch 0: 70%|███████ | 135/192 [01:13<00:30, 1.85it/s, v_num=9a_1, train_loss_step=0.183, train_rouge1_fmeasure_step=0.143, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.143, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.143, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.100] Epoch 0: 73%|███████▎ | 140/192 [01:15<00:28, 1.85it/s, v_num=9a_1, train_loss_step=0.183, train_rouge1_fmeasure_step=0.143, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.100, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.143, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.100, train_rougeLsum_fmeasure_step=0.143, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.100]Epoch 0: 73%|███████▎ | 140/192 [01:15<00:28, 1.85it/s, v_num=9a_1, train_loss_step=0.347, train_rouge1_fmeasure_step=0.400, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.333, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.400, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.333, train_rougeLsum_fmeasure_step=0.200, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.167]Epoch 0: 76%|███████▌ | 145/192 [01:18<00:25, 1.85it/s, v_num=9a_1, train_loss_step=0.347, train_rouge1_fmeasure_step=0.400, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.333, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.400, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.333, train_rougeLsum_fmeasure_step=0.200, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.167]Epoch 0: 76%|███████▌ | 145/192 [01:18<00:25, 1.85it/s, v_num=9a_1, train_loss_step=0.0571, train_rouge1_fmeasure_step=0.308, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.222, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.308, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.222, train_rougeLsum_fmeasure_step=0.308, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.222]Epoch 0: 78%|███████▊ | 150/192 [01:21<00:22, 1.85it/s, v_num=9a_1, train_loss_step=0.0571, train_rouge1_fmeasure_step=0.308, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.222, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.308, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.222, train_rougeLsum_fmeasure_step=0.308, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.222]Epoch 0: 78%|███████▊ | 150/192 [01:21<00:22, 1.85it/s, v_num=9a_1, train_loss_step=0.200, train_rouge1_fmeasure_step=0.167, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.111, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.167, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.111, train_rougeLsum_fmeasure_step=0.167, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.111] Epoch 0: 81%|████████ | 155/192 [01:23<00:20, 1.85it/s, v_num=9a_1, train_loss_step=0.200, train_rouge1_fmeasure_step=0.167, train_rouge1_precision_step=0.333, train_rouge1_recall_step=0.111, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.167, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.111, train_rougeLsum_fmeasure_step=0.167, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.111]Epoch 0: 81%|████████ | 155/192 [01:23<00:20, 1.85it/s, v_num=9a_1, train_loss_step=0.117, train_rouge1_fmeasure_step=0.545, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.364, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.400, train_rougeLsum_fmeasure_step=0.364, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.400]Epoch 0: 83%|████████▎ | 160/192 [01:26<00:17, 1.85it/s, v_num=9a_1, train_loss_step=0.117, train_rouge1_fmeasure_step=0.545, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.364, train_rougeL_precision_step=0.333, train_rougeL_recall_step=0.400, train_rougeLsum_fmeasure_step=0.364, train_rougeLsum_precision_step=0.333, train_rougeLsum_recall_step=0.400]Epoch 0: 83%|████████▎ | 160/192 [01:26<00:17, 1.85it/s, v_num=9a_1, train_loss_step=0.0689, train_rouge1_fmeasure_step=0.444, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.400, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.444, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.400, train_rougeLsum_fmeasure_step=0.444, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.400]Epoch 0: 86%|████████▌ | 165/192 [01:29<00:14, 1.85it/s, v_num=9a_1, train_loss_step=0.0689, train_rouge1_fmeasure_step=0.444, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.400, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.444, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.400, train_rougeLsum_fmeasure_step=0.444, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.400]Epoch 0: 86%|████████▌ | 165/192 [01:29<00:14, 1.85it/s, v_num=9a_1, train_loss_step=0.222, train_rouge1_fmeasure_step=0.200, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.200, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.200, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.167] Epoch 0: 89%|████████▊ | 170/192 [01:32<00:11, 1.85it/s, v_num=9a_1, train_loss_step=0.222, train_rouge1_fmeasure_step=0.200, train_rouge1_precision_step=0.250, train_rouge1_recall_step=0.167, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.200, train_rougeL_precision_step=0.250, train_rougeL_recall_step=0.167, train_rougeLsum_fmeasure_step=0.200, train_rougeLsum_precision_step=0.250, train_rougeLsum_recall_step=0.167]Epoch 0: 89%|████████▊ | 170/192 [01:32<00:11, 1.85it/s, v_num=9a_1, train_loss_step=0.079, train_rouge1_fmeasure_step=0.182, train_rouge1_precision_step=0.167, train_rouge1_recall_step=0.200, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.182, train_rougeL_precision_step=0.167, train_rougeL_recall_step=0.200, train_rougeLsum_fmeasure_step=0.182, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=0.200]Epoch 0: 91%|█████████ | 175/192 [01:34<00:09, 1.85it/s, v_num=9a_1, train_loss_step=0.079, train_rouge1_fmeasure_step=0.182, train_rouge1_precision_step=0.167, train_rouge1_recall_step=0.200, train_rouge2_fmeasure_step=0.000, train_rouge2_precision_step=0.000, train_rouge2_recall_step=0.000, train_rougeL_fmeasure_step=0.182, train_rougeL_precision_step=0.167, train_rougeL_recall_step=0.200, train_rougeLsum_fmeasure_step=0.182, train_rougeLsum_precision_step=0.167, train_rougeLsum_recall_step=0.200]Epoch 0: 91%|█████████ | 175/192 [01:34<00:09, 1.85it/s, v_num=9a_1, train_loss_step=0.147, train_rouge1_fmeasure_step=0.211, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.133, train_rouge2_fmeasure_step=0.118, train_rouge2_precision_step=0.333, train_rouge2_recall_step=0.0714, train_rougeL_fmeasure_step=0.211, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.133, train_rougeLsum_fmeasure_step=0.211, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.133]Epoch 0: 94%|█████████▍| 180/192 [01:37<00:06, 1.85it/s, v_num=9a_1, train_loss_step=0.147, train_rouge1_fmeasure_step=0.211, train_rouge1_precision_step=0.500, train_rouge1_recall_step=0.133, train_rouge2_fmeasure_step=0.118, train_rouge2_precision_step=0.333, train_rouge2_recall_step=0.0714, train_rougeL_fmeasure_step=0.211, train_rougeL_precision_step=0.500, train_rougeL_recall_step=0.133, train_rougeLsum_fmeasure_step=0.211, train_rougeLsum_precision_step=0.500, train_rougeLsum_recall_step=0.133]Epoch 0: 94%|█████████▍| 180/192 [01:37<00:06, 1.85it/s, v_num=9a_1, train_loss_step=0.0519, train_rouge1_fmeasure_step=0.800, train_rouge1_precision_step=0.800, train_rouge1_recall_step=0.800, train_rouge2_fmeasure_step=0.750, train_rouge2_precision_step=0.750, train_rouge2_recall_step=0.750, train_rougeL_fmeasure_step=0.800, train_rougeL_precision_step=0.800, train_rougeL_recall_step=0.800, train_rougeLsum_fmeasure_step=0.800, train_rougeLsum_precision_step=0.800, train_rougeLsum_recall_step=0.800]Epoch 0: 96%|█████████▋| 185/192 [01:40<00:03, 1.85it/s, v_num=9a_1, train_loss_step=0.0519, train_rouge1_fmeasure_step=0.800, train_rouge1_precision_step=0.800, train_rouge1_recall_step=0.800, train_rouge2_fmeasure_step=0.750, train_rouge2_precision_step=0.750, train_rouge2_recall_step=0.750, train_rougeL_fmeasure_step=0.800, train_rougeL_precision_step=0.800, train_rougeL_recall_step=0.800, train_rougeLsum_fmeasure_step=0.800, train_rougeLsum_precision_step=0.800, train_rougeLsum_recall_step=0.800]Epoch 0: 96%|█████████▋| 185/192 [01:40<00:03, 1.85it/s, v_num=9a_1, train_loss_step=1.010, train_rouge1_fmeasure_step=0.615, train_rouge1_precision_step=0.800, train_rouge1_recall_step=0.500, train_rouge2_fmeasure_step=0.364, train_rouge2_precision_step=0.500, train_rouge2_recall_step=0.286, train_rougeL_fmeasure_step=0.615, train_rougeL_precision_step=0.800, train_rougeL_recall_step=0.500, train_rougeLsum_fmeasure_step=0.615, train_rougeLsum_precision_step=0.800, train_rougeLsum_recall_step=0.500] Epoch 0: 99%|█████████▉| 190/192 [01:42<00:01, 1.85it/s, v_num=9a_1, train_loss_step=1.010, train_rouge1_fmeasure_step=0.615, train_rouge1_precision_step=0.800, train_rouge1_recall_step=0.500, train_rouge2_fmeasure_step=0.364, train_rouge2_precision_step=0.500, train_rouge2_recall_step=0.286, train_rougeL_fmeasure_step=0.615, train_rougeL_precision_step=0.800, train_rougeL_recall_step=0.500, train_rougeLsum_fmeasure_step=0.615, train_rougeLsum_precision_step=0.800, train_rougeLsum_recall_step=0.500]Epoch 0: 99%|█████████▉| 190/192 [01:42<00:01, 1.85it/s, v_num=9a_1, train_loss_step=0.0602, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 100%|██████████| 192/192 [01:43<00:00, 1.85it/s, v_num=9a_1, train_loss_step=0.0602, train_rouge1_fmeasure_step=0.667, train_rouge1_precision_step=0.750, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.571, train_rouge2_precision_step=0.667, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.667, train_rougeL_precision_step=0.750, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.667, train_rougeLsum_precision_step=0.750, train_rougeLsum_recall_step=0.600]Epoch 0: 100%|██████████| 192/192 [01:43<00:00, 1.85it/s, v_num=9a_1, train_loss_step=0.0711, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600]
Validation: | | 0/? [00:00<?, ?it/s][A
Validation: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 0%| | 0/12 [00:00<?, ?it/s][A
Validation DataLoader 0: 42%|████▏ | 5/12 [00:01<00:02, 2.93it/s][A
Validation DataLoader 0: 83%|████████▎ | 10/12 [00:03<00:00, 2.92it/s][A
Validation DataLoader 0: 100%|██████████| 12/12 [00:04<00:00, 2.92it/s][A
[AEpoch 0: 100%|██████████| 192/192 [01:48<00:00, 1.78it/s, v_num=9a_1, train_loss_step=0.0711, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600, val_loss_step=0.793, val_rouge1_fmeasure_step=0.599, val_rouge1_precision_step=1.000, val_rouge1_recall_step=0.440, val_rouge2_fmeasure_step=0.356, val_rouge2_precision_step=0.667, val_rouge2_recall_step=0.250, val_rougeL_fmeasure_step=0.599, val_rougeL_precision_step=1.000, val_rougeL_recall_step=0.440, val_rougeLsum_fmeasure_step=0.599, val_rougeLsum_precision_step=1.000, val_rougeLsum_recall_step=0.440, val_loss_epoch=0.386, val_rouge1_fmeasure_epoch=0.545, val_rouge1_precision_epoch=1.000, val_rouge1_recall_epoch=0.428, val_rouge2_fmeasure_epoch=0.391, val_rouge2_precision_epoch=0.736, val_rouge2_recall_epoch=0.305, val_rougeL_fmeasure_epoch=0.545, val_rougeL_precision_epoch=1.000, val_rougeL_recall_epoch=0.428, val_rougeLsum_fmeasure_epoch=0.545, val_rougeLsum_precision_epoch=1.000, val_rougeLsum_recall_epoch=0.428]Epoch 0: 100%|██████████| 192/192 [01:48<00:00, 1.78it/s, v_num=9a_1, train_loss_step=0.0711, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600, val_loss_step=0.793, val_rouge1_fmeasure_step=0.599, val_rouge1_precision_step=1.000, val_rouge1_recall_step=0.440, val_rouge2_fmeasure_step=0.356, val_rouge2_precision_step=0.667, val_rouge2_recall_step=0.250, val_rougeL_fmeasure_step=0.599, val_rougeL_precision_step=1.000, val_rougeL_recall_step=0.440, val_rougeLsum_fmeasure_step=0.599, val_rougeLsum_precision_step=1.000, val_rougeLsum_recall_step=0.440, val_loss_epoch=0.386, val_rouge1_fmeasure_epoch=0.545, val_rouge1_precision_epoch=1.000, val_rouge1_recall_epoch=0.428, val_rouge2_fmeasure_epoch=0.391, val_rouge2_precision_epoch=0.736, val_rouge2_recall_epoch=0.305, val_rougeL_fmeasure_epoch=0.545, val_rougeL_precision_epoch=1.000, val_rougeL_recall_epoch=0.428, val_rougeLsum_fmeasure_epoch=0.545, val_rougeLsum_precision_epoch=1.000, val_rougeLsum_recall_epoch=0.428, train_loss_epoch=0.510, train_rouge1_fmeasure_epoch=0.150, train_rouge1_precision_epoch=0.293, train_rouge1_recall_epoch=0.118, train_rouge2_fmeasure_epoch=0.0405, train_rouge2_precision_epoch=0.0738, train_rouge2_recall_epoch=0.0313, train_rougeL_fmeasure_epoch=0.147, train_rougeL_precision_epoch=0.290, train_rougeL_recall_epoch=0.115, train_rougeLsum_fmeasure_epoch=0.140, train_rougeLsum_precision_epoch=0.277, train_rougeLsum_recall_epoch=0.110]`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|██████████| 192/192 [02:17<00:00, 1.40it/s, v_num=9a_1, train_loss_step=0.0711, train_rouge1_fmeasure_step=0.750, train_rouge1_precision_step=1.000, train_rouge1_recall_step=0.600, train_rouge2_fmeasure_step=0.667, train_rouge2_precision_step=1.000, train_rouge2_recall_step=0.500, train_rougeL_fmeasure_step=0.750, train_rougeL_precision_step=1.000, train_rougeL_recall_step=0.600, train_rougeLsum_fmeasure_step=0.750, train_rougeLsum_precision_step=1.000, train_rougeLsum_recall_step=0.600, val_loss_step=0.793, val_rouge1_fmeasure_step=0.599, val_rouge1_precision_step=1.000, val_rouge1_recall_step=0.440, val_rouge2_fmeasure_step=0.356, val_rouge2_precision_step=0.667, val_rouge2_recall_step=0.250, val_rougeL_fmeasure_step=0.599, val_rougeL_precision_step=1.000, val_rougeL_recall_step=0.440, val_rougeLsum_fmeasure_step=0.599, val_rougeLsum_precision_step=1.000, val_rougeLsum_recall_step=0.440, val_loss_epoch=0.386, val_rouge1_fmeasure_epoch=0.545, val_rouge1_precision_epoch=1.000, val_rouge1_recall_epoch=0.428, val_rouge2_fmeasure_epoch=0.391, val_rouge2_precision_epoch=0.736, val_rouge2_recall_epoch=0.305, val_rougeL_fmeasure_epoch=0.545, val_rougeL_precision_epoch=1.000, val_rougeL_recall_epoch=0.428, val_rougeLsum_fmeasure_epoch=0.545, val_rougeLsum_precision_epoch=1.000, val_rougeLsum_recall_epoch=0.428, train_loss_epoch=0.510, train_rouge1_fmeasure_epoch=0.150, train_rouge1_precision_epoch=0.293, train_rouge1_recall_epoch=0.118, train_rouge2_fmeasure_epoch=0.0405, train_rouge2_precision_epoch=0.0738, train_rouge2_recall_epoch=0.0313, train_rougeL_fmeasure_epoch=0.147, train_rougeL_precision_epoch=0.290, train_rougeL_recall_epoch=0.115, train_rougeLsum_fmeasure_epoch=0.140, train_rougeLsum_precision_epoch=0.277, train_rougeLsum_recall_epoch=0.110]wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.014 MB uploadedwandb: / 0.014 MB of 0.025 MB uploadedwandb: - 0.014 MB of 0.025 MB uploadedwandb: \ 0.025 MB of 0.025 MB uploadedwandb:
wandb:
wandb: Run history:
wandb: epoch ▁▁▁▁▁
wandb: lr-PagedLion ▆█▁
wandb: train_loss_epoch ▁
wandb: train_loss_step █▁█
wandb: train_rouge1_fmeasure_epoch ▁
wandb: train_rouge1_fmeasure_step ▁▁█
wandb: train_rouge1_precision_epoch ▁
wandb: train_rouge1_precision_step ▁▁█
wandb: train_rouge1_recall_epoch ▁
wandb: train_rouge1_recall_step ▁▁█
wandb: train_rouge2_fmeasure_epoch ▁
wandb: train_rouge2_fmeasure_step ▁▁▁
wandb: train_rouge2_precision_epoch ▁
wandb: train_rouge2_precision_step ▁▁▁
wandb: train_rouge2_recall_epoch ▁
wandb: train_rouge2_recall_step ▁▁▁
wandb: train_rougeL_fmeasure_epoch ▁
wandb: train_rougeL_fmeasure_step ▁▁█
wandb: train_rougeL_precision_epoch ▁
wandb: train_rougeL_precision_step ▁▁█
wandb: train_rougeL_recall_epoch ▁
wandb: train_rougeL_recall_step ▁▁█
wandb: train_rougeLsum_fmeasure_epoch ▁
wandb: train_rougeLsum_fmeasure_step ▁▁█
wandb: train_rougeLsum_precision_epoch ▁
wandb: train_rougeLsum_precision_step ▁▁█
wandb: train_rougeLsum_recall_epoch ▁
wandb: train_rougeLsum_recall_step ▁▁█
wandb: trainer/global_step ▃▃▅▅▆▆▁▁▁▁▁▁▁▁▁▁▁▁██
wandb: val_loss_epoch ▁
wandb: val_loss_step ▃▁▂▂▂█▂▁▁▁▁▄
wandb: val_rouge1_fmeasure_epoch ▁
wandb: val_rouge1_fmeasure_step ▄▆▃▃▃▁▇█▇▇▇▆
wandb: val_rouge1_precision_epoch ▁
wandb: val_rouge1_precision_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rouge1_recall_epoch ▁
wandb: val_rouge1_recall_step ▃▅▂▃▃▁▆█▆▇▇▅
wandb: val_rouge2_fmeasure_epoch ▁
wandb: val_rouge2_fmeasure_step ▃▄▂▂▂▁▆█▅▆▆▄
wandb: val_rouge2_precision_epoch ▁
wandb: val_rouge2_precision_step ▁▁▁▁▁▁▅█▁▅▅▁
wandb: val_rouge2_recall_epoch ▁
wandb: val_rouge2_recall_step ▂▃▂▂▂▁▅█▄▆▆▃
wandb: val_rougeL_fmeasure_epoch ▁
wandb: val_rougeL_fmeasure_step ▄▆▃▃▃▁▇█▇▇▇▆
wandb: val_rougeL_precision_epoch ▁
wandb: val_rougeL_precision_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rougeL_recall_epoch ▁
wandb: val_rougeL_recall_step ▃▅▂▃▃▁▆█▆▇▇▅
wandb: val_rougeLsum_fmeasure_epoch ▁
wandb: val_rougeLsum_fmeasure_step ▄▆▃▃▃▁▇█▇▇▇▆
wandb: val_rougeLsum_precision_epoch ▁
wandb: val_rougeLsum_precision_step ▁▁▁▁▁▁▁▁▁▁▁▁
wandb: val_rougeLsum_recall_epoch ▁
wandb: val_rougeLsum_recall_step ▃▅▂▃▃▁▆█▆▇▇▅
wandb:
wandb: Run summary:
wandb: epoch 0
wandb: lr-PagedLion 1e-05
wandb: train_loss_epoch 0.50961
wandb: train_loss_step 0.19961
wandb: train_rouge1_fmeasure_epoch 0.14963
wandb: train_rouge1_fmeasure_step 0.16667
wandb: train_rouge1_precision_epoch 0.29329
wandb: train_rouge1_precision_step 0.33333
wandb: train_rouge1_recall_epoch 0.11779
wandb: train_rouge1_recall_step 0.11111
wandb: train_rouge2_fmeasure_epoch 0.04048
wandb: train_rouge2_fmeasure_step 0.0
wandb: train_rouge2_precision_epoch 0.07378
wandb: train_rouge2_precision_step 0.0
wandb: train_rouge2_recall_epoch 0.03132
wandb: train_rouge2_recall_step 0.0
wandb: train_rougeL_fmeasure_epoch 0.14714
wandb: train_rougeL_fmeasure_step 0.16667
wandb: train_rougeL_precision_epoch 0.28968
wandb: train_rougeL_precision_step 0.33333
wandb: train_rougeL_recall_epoch 0.11543
wandb: train_rougeL_recall_step 0.11111
wandb: train_rougeLsum_fmeasure_epoch 0.13952
wandb: train_rougeLsum_fmeasure_step 0.16667
wandb: train_rougeLsum_precision_epoch 0.27658
wandb: train_rougeLsum_precision_step 0.33333
wandb: train_rougeLsum_recall_epoch 0.10955
wandb: train_rougeLsum_recall_step 0.11111
wandb: trainer/global_step 191
wandb: val_loss_epoch 0.3865
wandb: val_loss_step 0.79297
wandb: val_rouge1_fmeasure_epoch 0.54495
wandb: val_rouge1_fmeasure_step 0.59893
wandb: val_rouge1_precision_epoch 1.0
wandb: val_rouge1_precision_step 1.0
wandb: val_rouge1_recall_epoch 0.4281
wandb: val_rouge1_recall_step 0.43956
wandb: val_rouge2_fmeasure_epoch 0.3909
wandb: val_rouge2_fmeasure_step 0.35556
wandb: val_rouge2_precision_epoch 0.73611
wandb: val_rouge2_precision_step 0.66667
wandb: val_rouge2_recall_epoch 0.30488
wandb: val_rouge2_recall_step 0.25
wandb: val_rougeL_fmeasure_epoch 0.54495
wandb: val_rougeL_fmeasure_step 0.59893
wandb: val_rougeL_precision_epoch 1.0
wandb: val_rougeL_precision_step 1.0
wandb: val_rougeL_recall_epoch 0.4281
wandb: val_rougeL_recall_step 0.43956
wandb: val_rougeLsum_fmeasure_epoch 0.54495
wandb: val_rougeLsum_fmeasure_step 0.59893
wandb: val_rougeLsum_precision_epoch 1.0
wandb: val_rougeLsum_precision_step 1.0
wandb: val_rougeLsum_recall_epoch 0.4281
wandb: val_rougeLsum_recall_step 0.43956
wandb:
wandb: 🚀 View run chocolate-sweep-6 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/iz24uc9a
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_140206-iz24uc9a/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
wandb: Agent Starting Run: yal4wup5 with config:
wandb: accumulate_grad_batches: 2
wandb: epochs: 1
wandb: gradient_clip_val: 0.7997351101275753
wandb: init_lora_weights: loftq
wandb: lora_alpha: 64
wandb: lora_dropout: 0.07665943005300253
wandb: lora_rank: 8
wandb: lr: 6.477116281072834e-05
wandb: model_name: mistralai/Mistral-7B-Instruct-v0.3
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /user/jonathan/wandb/run-20240806_140745-yal4wup5
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run snowy-sweep-7
wandb: ⭐️ View project at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: 🧹 View sweep at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/sweeps/4ik6ex9z
wandb: 🚀 View run at https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/yal4wup5
Unused kwargs: ['bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Map: 0%| | 0/960 [00:00<?, ? examples/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Map: 100%|██████████| 960/960 [00:00<00:00, 7690.93 examples/s]Map: 100%|██████████| 960/960 [00:00<00:00, 6509.79 examples/s]
Map: 0%| | 0/960 [00:00<?, ? examples/s]Map: 0%| | 0/960 [00:00<?, ? examples/s]
wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.014 MB uploadedwandb: / 0.014 MB of 0.020 MB uploadedwandb: - 0.014 MB of 0.020 MB uploadedwandb: \ 0.020 MB of 0.020 MB uploadedwandb:
wandb: 🚀 View run snowy-sweep-7 at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning/runs/yal4wup5
wandb: ⭐️ View project at: https://wandb.ai/j0ntendo-yonsei-university/LLM-Finetuning
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240806_140745-yal4wup5/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
Run yal4wup5 errored:
Traceback (most recent call last):
File "/user/jonathan/jonathan/lib/python3.10/site-packages/wandb/agents/pyagent.py", line 307, in _run_job
self._function()
File "/user/jonathan/l_sweep.py", line 50, in l2ray_trainer
dataset = get_dataset(dataset_name=dataset_path, tokenizer=tokenizer)
File "/user/jonathan/finetuning_datasets.py", line 40, in get_dataset
en_dataset = raw_dataset.map(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
{
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
k: dataset.map(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3161, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3552, in _map_single
batch = apply_function_on_filtered_inputs(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/user/jonathan/finetuning_datafunctions.py", line 65, in preprocess_function
templated_text, labels = formatting(sample=sample,
File "/user/jonathan/finetuning_datafunctions.py", line 38, in formatting
bot_message = tokenizer.apply_chat_template(conversation=label_template,
File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1855, in apply_chat_template
rendered_chat = compiled_template.render(
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
self.environment.handle_exception()
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 14, in top-level template code
File "/user/jonathan/jonathan/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
return __context.call(__obj, *args, **kwargs)
File "/user/jonathan/jonathan/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1899, in raise_exception
raise TemplateError(message)
jinja2.exceptions.TemplateError: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
wandb: ERROR Run yal4wup5 errored:
wandb: ERROR Traceback (most recent call last):
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/wandb/agents/pyagent.py", line 307, in _run_job
wandb: ERROR self._function()
wandb: ERROR File "/user/jonathan/l_sweep.py", line 50, in l2ray_trainer
wandb: ERROR dataset = get_dataset(dataset_name=dataset_path, tokenizer=tokenizer)
wandb: ERROR File "/user/jonathan/finetuning_datasets.py", line 40, in get_dataset
wandb: ERROR en_dataset = raw_dataset.map(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
wandb: ERROR {
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
wandb: ERROR k: dataset.map(
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
wandb: ERROR out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
wandb: ERROR File "/user/jonathan/jonathan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
wandb: ERROR out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)