-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathtraining_log.txt
9708 lines (9708 loc) · 607 KB
/
training_log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Epoch [1/3], Step [1/3236], Loss: 4.9875, Perplexity: 146.5708
Epoch [1/3], Step [2/3236], Loss: 4.8064, Perplexity: 122.2868
Epoch [1/3], Step [3/3236], Loss: 4.8942, Perplexity: 133.5105
Epoch [1/3], Step [4/3236], Loss: 4.9106, Perplexity: 135.7240
Epoch [1/3], Step [5/3236], Loss: 4.7509, Perplexity: 115.6888
Epoch [1/3], Step [6/3236], Loss: 4.8005, Perplexity: 121.5667
Epoch [1/3], Step [7/3236], Loss: 4.8507, Perplexity: 127.8347
Epoch [1/3], Step [8/3236], Loss: 4.9314, Perplexity: 138.5782
Epoch [1/3], Step [9/3236], Loss: 4.7543, Perplexity: 116.0769
Epoch [1/3], Step [10/3236], Loss: 4.7416, Perplexity: 114.6170
Epoch [1/3], Step [11/3236], Loss: 4.6049, Perplexity: 99.9765
Epoch [1/3], Step [12/3236], Loss: 4.7264, Perplexity: 112.8899
Epoch [1/3], Step [13/3236], Loss: 4.8506, Perplexity: 127.8109
Epoch [1/3], Step [14/3236], Loss: 4.7094, Perplexity: 110.9900
Epoch [1/3], Step [15/3236], Loss: 4.6741, Perplexity: 107.1395
Epoch [1/3], Step [16/3236], Loss: 4.5445, Perplexity: 94.1137
Epoch [1/3], Step [17/3236], Loss: 4.7281, Perplexity: 113.0814
Epoch [1/3], Step [18/3236], Loss: 4.5319, Perplexity: 92.9304
Epoch [1/3], Step [19/3236], Loss: 4.4778, Perplexity: 88.0371
Epoch [1/3], Step [20/3236], Loss: 4.3863, Perplexity: 80.3407
Epoch [1/3], Step [21/3236], Loss: 4.4781, Perplexity: 88.0694
Epoch [1/3], Step [22/3236], Loss: 4.8625, Perplexity: 129.3452
Epoch [1/3], Step [23/3236], Loss: 4.3583, Perplexity: 78.1243
Epoch [1/3], Step [24/3236], Loss: 4.3601, Perplexity: 78.2658
Epoch [1/3], Step [25/3236], Loss: 4.9295, Perplexity: 138.3121
Epoch [1/3], Step [26/3236], Loss: 4.6823, Perplexity: 108.0212
Epoch [1/3], Step [27/3236], Loss: 4.4537, Perplexity: 85.9415
Epoch [1/3], Step [28/3236], Loss: 4.2606, Perplexity: 70.8545
Epoch [1/3], Step [29/3236], Loss: 4.4760, Perplexity: 87.8864
Epoch [1/3], Step [30/3236], Loss: 4.2987, Perplexity: 73.6041
Epoch [1/3], Step [31/3236], Loss: 4.3846, Perplexity: 80.2046
Epoch [1/3], Step [32/3236], Loss: 4.1764, Perplexity: 65.1282
Epoch [1/3], Step [33/3236], Loss: 4.1460, Perplexity: 63.1817
Epoch [1/3], Step [34/3236], Loss: 4.2512, Perplexity: 70.1891
Epoch [1/3], Step [35/3236], Loss: 4.2243, Perplexity: 68.3240
Epoch [1/3], Step [36/3236], Loss: 4.3657, Perplexity: 78.7055
Epoch [1/3], Step [37/3236], Loss: 4.1622, Perplexity: 64.2131
Epoch [1/3], Step [38/3236], Loss: 4.2372, Perplexity: 69.2111
Epoch [1/3], Step [39/3236], Loss: 4.4465, Perplexity: 85.3282
Epoch [1/3], Step [40/3236], Loss: 4.2422, Perplexity: 69.5628
Epoch [1/3], Step [41/3236], Loss: 4.4958, Perplexity: 89.6413
Epoch [1/3], Step [42/3236], Loss: 4.7827, Perplexity: 119.4250
Epoch [1/3], Step [43/3236], Loss: 4.5767, Perplexity: 97.1943
Epoch [1/3], Step [44/3236], Loss: 4.5759, Perplexity: 97.1185
Epoch [1/3], Step [45/3236], Loss: 4.0775, Perplexity: 58.9953
Epoch [1/3], Step [46/3236], Loss: 4.2859, Perplexity: 72.6675
Epoch [1/3], Step [47/3236], Loss: 4.2768, Perplexity: 72.0112
Epoch [1/3], Step [48/3236], Loss: 4.1726, Perplexity: 64.8807
Epoch [1/3], Step [49/3236], Loss: 4.1399, Perplexity: 62.7947
Epoch [1/3], Step [50/3236], Loss: 4.1211, Perplexity: 61.6246
Epoch [1/3], Step [51/3236], Loss: 4.1127, Perplexity: 61.1102
Epoch [1/3], Step [52/3236], Loss: 4.0132, Perplexity: 55.3236
Epoch [1/3], Step [53/3236], Loss: 4.7375, Perplexity: 114.1459
Epoch [1/3], Step [54/3236], Loss: 3.9483, Perplexity: 51.8476
Epoch [1/3], Step [55/3236], Loss: 4.0219, Perplexity: 55.8065
Epoch [1/3], Step [56/3236], Loss: 4.0235, Perplexity: 55.8978
Epoch [1/3], Step [57/3236], Loss: 4.5473, Perplexity: 94.3760
Epoch [1/3], Step [58/3236], Loss: 3.7802, Perplexity: 43.8246
Epoch [1/3], Step [59/3236], Loss: 3.9641, Perplexity: 52.6746
Epoch [1/3], Step [60/3236], Loss: 3.9634, Perplexity: 52.6365
Epoch [1/3], Step [61/3236], Loss: 4.1104, Perplexity: 60.9707
Epoch [1/3], Step [62/3236], Loss: 4.3375, Perplexity: 76.5180
Epoch [1/3], Step [63/3236], Loss: 3.7479, Perplexity: 42.4334
Epoch [1/3], Step [64/3236], Loss: 3.8777, Perplexity: 48.3140
Epoch [1/3], Step [65/3236], Loss: 3.7410, Perplexity: 42.1396
Epoch [1/3], Step [66/3236], Loss: 3.8604, Perplexity: 47.4866
Epoch [1/3], Step [67/3236], Loss: 4.2219, Perplexity: 68.1619
Epoch [1/3], Step [68/3236], Loss: 3.9445, Perplexity: 51.6511
Epoch [1/3], Step [69/3236], Loss: 3.8191, Perplexity: 45.5631
Epoch [1/3], Step [70/3236], Loss: 4.6706, Perplexity: 106.7594
Epoch [1/3], Step [71/3236], Loss: 4.0058, Perplexity: 54.9169
Epoch [1/3], Step [72/3236], Loss: 3.8595, Perplexity: 47.4412
Epoch [1/3], Step [73/3236], Loss: 3.8705, Perplexity: 47.9646
Epoch [1/3], Step [74/3236], Loss: 4.3862, Perplexity: 80.3319
Epoch [1/3], Step [75/3236], Loss: 4.0119, Perplexity: 55.2521
Epoch [1/3], Step [76/3236], Loss: 3.8942, Perplexity: 49.1189
Epoch [1/3], Step [77/3236], Loss: 3.7786, Perplexity: 43.7569
Epoch [1/3], Step [78/3236], Loss: 3.9511, Perplexity: 51.9920
Epoch [1/3], Step [79/3236], Loss: 3.8050, Perplexity: 44.9241
Epoch [1/3], Step [80/3236], Loss: 3.9525, Perplexity: 52.0669
Epoch [1/3], Step [81/3236], Loss: 3.9302, Perplexity: 50.9164
Epoch [1/3], Step [82/3236], Loss: 3.8891, Perplexity: 48.8681
Epoch [1/3], Step [83/3236], Loss: 3.9301, Perplexity: 50.9131
Epoch [1/3], Step [84/3236], Loss: 3.8803, Perplexity: 48.4394
Epoch [1/3], Step [85/3236], Loss: 3.9213, Perplexity: 50.4658
Epoch [1/3], Step [86/3236], Loss: 4.0134, Perplexity: 55.3331
Epoch [1/3], Step [87/3236], Loss: 3.8366, Perplexity: 46.3653
Epoch [1/3], Step [88/3236], Loss: 4.0066, Perplexity: 54.9606
Epoch [1/3], Step [89/3236], Loss: 3.9286, Perplexity: 50.8355
Epoch [1/3], Step [90/3236], Loss: 3.7138, Perplexity: 41.0083
Epoch [1/3], Step [91/3236], Loss: 3.8761, Perplexity: 48.2350
Epoch [1/3], Step [92/3236], Loss: 3.7801, Perplexity: 43.8188
Epoch [1/3], Step [93/3236], Loss: 4.0004, Perplexity: 54.6216
Epoch [1/3], Step [94/3236], Loss: 3.7175, Perplexity: 41.1621
Epoch [1/3], Step [95/3236], Loss: 3.7534, Perplexity: 42.6668
Epoch [1/3], Step [96/3236], Loss: 3.6625, Perplexity: 38.9600
Epoch [1/3], Step [97/3236], Loss: 3.7039, Perplexity: 40.6042
Epoch [1/3], Step [98/3236], Loss: 3.7837, Perplexity: 43.9796
Epoch [1/3], Step [99/3236], Loss: 3.7203, Perplexity: 41.2774
Epoch [1/3], Step [100/3236], Loss: 3.7170, Perplexity: 41.1416
Epoch [1/3], Step [101/3236], Loss: 3.7871, Perplexity: 44.1296
Epoch [1/3], Step [102/3236], Loss: 3.6752, Perplexity: 39.4580
Epoch [1/3], Step [103/3236], Loss: 3.5961, Perplexity: 36.4561
Epoch [1/3], Step [104/3236], Loss: 3.6277, Perplexity: 37.6257
Epoch [1/3], Step [105/3236], Loss: 3.7818, Perplexity: 43.8947
Epoch [1/3], Step [106/3236], Loss: 3.7520, Perplexity: 42.6083
Epoch [1/3], Step [107/3236], Loss: 3.6951, Perplexity: 40.2505
Epoch [1/3], Step [108/3236], Loss: 3.7289, Perplexity: 41.6320
Epoch [1/3], Step [109/3236], Loss: 3.6872, Perplexity: 39.9349
Epoch [1/3], Step [110/3236], Loss: 3.9618, Perplexity: 52.5541
Epoch [1/3], Step [111/3236], Loss: 3.9012, Perplexity: 49.4625
Epoch [1/3], Step [112/3236], Loss: 3.6276, Perplexity: 37.6220
Epoch [1/3], Step [113/3236], Loss: 3.6477, Perplexity: 38.3850
Epoch [1/3], Step [114/3236], Loss: 3.6143, Perplexity: 37.1272
Epoch [1/3], Step [115/3236], Loss: 3.8713, Perplexity: 48.0065
Epoch [1/3], Step [116/3236], Loss: 3.9461, Perplexity: 51.7356
Epoch [1/3], Step [117/3236], Loss: 3.6046, Perplexity: 36.7680
Epoch [1/3], Step [118/3236], Loss: 3.8506, Perplexity: 47.0223
Epoch [1/3], Step [119/3236], Loss: 3.9823, Perplexity: 53.6376
Epoch [1/3], Step [120/3236], Loss: 3.6282, Perplexity: 37.6436
Epoch [1/3], Step [121/3236], Loss: 3.8096, Perplexity: 45.1313
Epoch [1/3], Step [122/3236], Loss: 3.7271, Perplexity: 41.5574
Epoch [1/3], Step [123/3236], Loss: 3.7558, Perplexity: 42.7692
Epoch [1/3], Step [124/3236], Loss: 3.5481, Perplexity: 34.7467
Epoch [1/3], Step [125/3236], Loss: 3.5677, Perplexity: 35.4350
Epoch [1/3], Step [126/3236], Loss: 3.9302, Perplexity: 50.9156
Epoch [1/3], Step [127/3236], Loss: 3.5665, Perplexity: 35.3919
Epoch [1/3], Step [128/3236], Loss: 3.5514, Perplexity: 34.8626
Epoch [1/3], Step [129/3236], Loss: 4.1130, Perplexity: 61.1308
Epoch [1/3], Step [130/3236], Loss: 3.7093, Perplexity: 40.8253
Epoch [1/3], Step [131/3236], Loss: 3.6295, Perplexity: 37.6952
Epoch [1/3], Step [132/3236], Loss: 3.9076, Perplexity: 49.7800
Epoch [1/3], Step [133/3236], Loss: 3.7558, Perplexity: 42.7694
Epoch [1/3], Step [134/3236], Loss: 3.5348, Perplexity: 34.2866
Epoch [1/3], Step [135/3236], Loss: 3.6510, Perplexity: 38.5132
Epoch [1/3], Step [136/3236], Loss: 3.5337, Perplexity: 34.2499
Epoch [1/3], Step [137/3236], Loss: 3.4482, Perplexity: 31.4422
Epoch [1/3], Step [138/3236], Loss: 3.4471, Perplexity: 31.4101
Epoch [1/3], Step [139/3236], Loss: 3.5959, Perplexity: 36.4489
Epoch [1/3], Step [140/3236], Loss: 3.5690, Perplexity: 35.4827
Epoch [1/3], Step [141/3236], Loss: 4.2764, Perplexity: 71.9806
Epoch [1/3], Step [142/3236], Loss: 3.7985, Perplexity: 44.6338
Epoch [1/3], Step [143/3236], Loss: 3.6294, Perplexity: 37.6910
Epoch [1/3], Step [144/3236], Loss: 3.5319, Perplexity: 34.1891
Epoch [1/3], Step [145/3236], Loss: 3.9677, Perplexity: 52.8647
Epoch [1/3], Step [146/3236], Loss: 3.7432, Perplexity: 42.2310
Epoch [1/3], Step [147/3236], Loss: 3.5199, Perplexity: 33.7799
Epoch [1/3], Step [148/3236], Loss: 3.7303, Perplexity: 41.6930
Epoch [1/3], Step [149/3236], Loss: 3.6045, Perplexity: 36.7628
Epoch [1/3], Step [150/3236], Loss: 3.4918, Perplexity: 32.8439
Epoch [1/3], Step [151/3236], Loss: 3.7185, Perplexity: 41.2035
Epoch [1/3], Step [152/3236], Loss: 3.4238, Perplexity: 30.6843
Epoch [1/3], Step [153/3236], Loss: 3.5563, Perplexity: 35.0339
Epoch [1/3], Step [154/3236], Loss: 3.8665, Perplexity: 47.7748
Epoch [1/3], Step [155/3236], Loss: 3.4282, Perplexity: 30.8219
Epoch [1/3], Step [156/3236], Loss: 3.6051, Perplexity: 36.7869
Epoch [1/3], Step [157/3236], Loss: 3.5245, Perplexity: 33.9353
Epoch [1/3], Step [158/3236], Loss: 3.8105, Perplexity: 45.1719
Epoch [1/3], Step [159/3236], Loss: 3.4833, Perplexity: 32.5670
Epoch [1/3], Step [160/3236], Loss: 3.3758, Perplexity: 29.2477
Epoch [1/3], Step [161/3236], Loss: 4.1599, Perplexity: 64.0678
Epoch [1/3], Step [162/3236], Loss: 3.6461, Perplexity: 38.3253
Epoch [1/3], Step [163/3236], Loss: 3.4757, Perplexity: 32.3219
Epoch [1/3], Step [164/3236], Loss: 3.5329, Perplexity: 34.2222
Epoch [1/3], Step [165/3236], Loss: 3.6490, Perplexity: 38.4372
Epoch [1/3], Step [166/3236], Loss: 3.3887, Perplexity: 29.6278
Epoch [1/3], Step [167/3236], Loss: 3.8376, Perplexity: 46.4131
Epoch [1/3], Step [168/3236], Loss: 3.4970, Perplexity: 33.0177
Epoch [1/3], Step [169/3236], Loss: 3.3978, Perplexity: 29.8968
Epoch [1/3], Step [170/3236], Loss: 3.5053, Perplexity: 33.2899
Epoch [1/3], Step [171/3236], Loss: 3.4521, Perplexity: 31.5662
Epoch [1/3], Step [172/3236], Loss: 3.5604, Perplexity: 35.1778
Epoch [1/3], Step [173/3236], Loss: 3.4550, Perplexity: 31.6586
Epoch [1/3], Step [174/3236], Loss: 3.5779, Perplexity: 35.8000
Epoch [1/3], Step [175/3236], Loss: 3.8844, Perplexity: 48.6395
Epoch [1/3], Step [176/3236], Loss: 3.3976, Perplexity: 29.8923
Epoch [1/3], Step [177/3236], Loss: 3.7322, Perplexity: 41.7698
Epoch [1/3], Step [178/3236], Loss: 3.3762, Perplexity: 29.2588
Epoch [1/3], Step [179/3236], Loss: 3.3893, Perplexity: 29.6441
Epoch [1/3], Step [180/3236], Loss: 3.2403, Perplexity: 25.5409
Epoch [1/3], Step [181/3236], Loss: 3.5880, Perplexity: 36.1623
Epoch [1/3], Step [182/3236], Loss: 3.3635, Perplexity: 28.8912
Epoch [1/3], Step [183/3236], Loss: 3.4223, Perplexity: 30.6387
Epoch [1/3], Step [184/3236], Loss: 3.5825, Perplexity: 35.9624
Epoch [1/3], Step [185/3236], Loss: 3.4942, Perplexity: 32.9256
Epoch [1/3], Step [186/3236], Loss: 3.6120, Perplexity: 37.0393
Epoch [1/3], Step [187/3236], Loss: 3.5487, Perplexity: 34.7696
Epoch [1/3], Step [188/3236], Loss: 3.4108, Perplexity: 30.2888
Epoch [1/3], Step [189/3236], Loss: 3.3886, Perplexity: 29.6238
Epoch [1/3], Step [190/3236], Loss: 3.3575, Perplexity: 28.7177
Epoch [1/3], Step [191/3236], Loss: 3.5045, Perplexity: 33.2632
Epoch [1/3], Step [192/3236], Loss: 3.3495, Perplexity: 28.4891
Epoch [1/3], Step [193/3236], Loss: 3.4030, Perplexity: 30.0529
Epoch [1/3], Step [194/3236], Loss: 4.3969, Perplexity: 81.1993
Epoch [1/3], Step [195/3236], Loss: 3.3384, Perplexity: 28.1745
Epoch [1/3], Step [196/3236], Loss: 3.4767, Perplexity: 32.3542
Epoch [1/3], Step [197/3236], Loss: 3.4548, Perplexity: 31.6516
Epoch [1/3], Step [198/3236], Loss: 3.6558, Perplexity: 38.7004
Epoch [1/3], Step [199/3236], Loss: 3.4683, Perplexity: 32.0811
Epoch [1/3], Step [200/3236], Loss: 3.6708, Perplexity: 39.2846
Epoch [1/3], Step [201/3236], Loss: 3.4131, Perplexity: 30.3607
Epoch [1/3], Step [202/3236], Loss: 3.2693, Perplexity: 26.2940
Epoch [1/3], Step [203/3236], Loss: 3.5605, Perplexity: 35.1803
Epoch [1/3], Step [204/3236], Loss: 3.2983, Perplexity: 27.0656
Epoch [1/3], Step [205/3236], Loss: 3.4290, Perplexity: 30.8461
Epoch [1/3], Step [206/3236], Loss: 3.5670, Perplexity: 35.4096
Epoch [1/3], Step [207/3236], Loss: 3.4683, Perplexity: 32.0833
Epoch [1/3], Step [208/3236], Loss: 3.5313, Perplexity: 34.1699
Epoch [1/3], Step [209/3236], Loss: 3.4615, Perplexity: 31.8652
Epoch [1/3], Step [210/3236], Loss: 3.4727, Perplexity: 32.2232
Epoch [1/3], Step [211/3236], Loss: 3.4865, Perplexity: 32.6719
Epoch [1/3], Step [212/3236], Loss: 3.5082, Perplexity: 33.3879
Epoch [1/3], Step [213/3236], Loss: 3.4700, Perplexity: 32.1354
Epoch [1/3], Step [214/3236], Loss: 3.3006, Perplexity: 27.1287
Epoch [1/3], Step [215/3236], Loss: 3.2241, Perplexity: 25.1307
Epoch [1/3], Step [216/3236], Loss: 3.2175, Perplexity: 24.9644
Epoch [1/3], Step [217/3236], Loss: 3.4047, Perplexity: 30.1046
Epoch [1/3], Step [218/3236], Loss: 3.3054, Perplexity: 27.2606
Epoch [1/3], Step [219/3236], Loss: 3.3620, Perplexity: 28.8456
Epoch [1/3], Step [220/3236], Loss: 4.4191, Perplexity: 83.0213
Epoch [1/3], Step [221/3236], Loss: 3.3132, Perplexity: 27.4716
Epoch [1/3], Step [222/3236], Loss: 3.5781, Perplexity: 35.8042
Epoch [1/3], Step [223/3236], Loss: 3.4603, Perplexity: 31.8256
Epoch [1/3], Step [224/3236], Loss: 3.4070, Perplexity: 30.1737
Epoch [1/3], Step [225/3236], Loss: 3.1344, Perplexity: 22.9757
Epoch [1/3], Step [226/3236], Loss: 3.3654, Perplexity: 28.9447
Epoch [1/3], Step [227/3236], Loss: 3.7136, Perplexity: 41.0002
Epoch [1/3], Step [228/3236], Loss: 3.4343, Perplexity: 31.0095
Epoch [1/3], Step [229/3236], Loss: 3.4320, Perplexity: 30.9371
Epoch [1/3], Step [230/3236], Loss: 3.4267, Perplexity: 30.7760
Epoch [1/3], Step [231/3236], Loss: 3.2827, Perplexity: 26.6473
Epoch [1/3], Step [232/3236], Loss: 3.3742, Perplexity: 29.2017
Epoch [1/3], Step [233/3236], Loss: 3.2588, Perplexity: 26.0175
Epoch [1/3], Step [234/3236], Loss: 3.3477, Perplexity: 28.4372
Epoch [1/3], Step [235/3236], Loss: 3.3630, Perplexity: 28.8753
Epoch [1/3], Step [236/3236], Loss: 3.2158, Perplexity: 24.9234
Epoch [1/3], Step [237/3236], Loss: 3.3291, Perplexity: 27.9143
Epoch [1/3], Step [238/3236], Loss: 3.3102, Perplexity: 27.3918
Epoch [1/3], Step [239/3236], Loss: 3.3095, Perplexity: 27.3718
Epoch [1/3], Step [240/3236], Loss: 3.3186, Perplexity: 27.6214
Epoch [1/3], Step [241/3236], Loss: 3.2836, Perplexity: 26.6709
Epoch [1/3], Step [242/3236], Loss: 3.1500, Perplexity: 23.3365
Epoch [1/3], Step [243/3236], Loss: 3.3243, Perplexity: 27.7794
Epoch [1/3], Step [244/3236], Loss: 3.5616, Perplexity: 35.2190
Epoch [1/3], Step [245/3236], Loss: 3.4162, Perplexity: 30.4549
Epoch [1/3], Step [246/3236], Loss: 3.3498, Perplexity: 28.4980
Epoch [1/3], Step [247/3236], Loss: 3.2347, Perplexity: 25.3984
Epoch [1/3], Step [248/3236], Loss: 3.2665, Perplexity: 26.2189
Epoch [1/3], Step [249/3236], Loss: 3.5737, Perplexity: 35.6472
Epoch [1/3], Step [250/3236], Loss: 3.1112, Perplexity: 22.4483
Epoch [1/3], Step [251/3236], Loss: 3.2865, Perplexity: 26.7487
Epoch [1/3], Step [252/3236], Loss: 3.4505, Perplexity: 31.5151
Epoch [1/3], Step [253/3236], Loss: 3.4713, Perplexity: 32.1790
Epoch [1/3], Step [254/3236], Loss: 3.4269, Perplexity: 30.7825
Epoch [1/3], Step [255/3236], Loss: 3.2852, Perplexity: 26.7152
Epoch [1/3], Step [256/3236], Loss: 3.3046, Perplexity: 27.2369
Epoch [1/3], Step [257/3236], Loss: 3.1323, Perplexity: 22.9270
Epoch [1/3], Step [258/3236], Loss: 3.3519, Perplexity: 28.5563
Epoch [1/3], Step [259/3236], Loss: 3.2813, Perplexity: 26.6106
Epoch [1/3], Step [260/3236], Loss: 3.3561, Perplexity: 28.6768
Epoch [1/3], Step [261/3236], Loss: 3.2826, Perplexity: 26.6453
Epoch [1/3], Step [262/3236], Loss: 3.3934, Perplexity: 29.7675
Epoch [1/3], Step [263/3236], Loss: 3.5077, Perplexity: 33.3707
Epoch [1/3], Step [264/3236], Loss: 3.4806, Perplexity: 32.4782
Epoch [1/3], Step [265/3236], Loss: 3.3096, Perplexity: 27.3742
Epoch [1/3], Step [266/3236], Loss: 3.3496, Perplexity: 28.4921
Epoch [1/3], Step [267/3236], Loss: 3.4641, Perplexity: 31.9490
Epoch [1/3], Step [268/3236], Loss: 3.2542, Perplexity: 25.8997
Epoch [1/3], Step [269/3236], Loss: 3.1410, Perplexity: 23.1270
Epoch [1/3], Step [270/3236], Loss: 3.4459, Perplexity: 31.3730
Epoch [1/3], Step [271/3236], Loss: 3.2787, Perplexity: 26.5411
Epoch [1/3], Step [272/3236], Loss: 3.4175, Perplexity: 30.4917
Epoch [1/3], Step [273/3236], Loss: 3.2169, Perplexity: 24.9516
Epoch [1/3], Step [274/3236], Loss: 3.3761, Perplexity: 29.2563
Epoch [1/3], Step [275/3236], Loss: 3.2853, Perplexity: 26.7182
Epoch [1/3], Step [276/3236], Loss: 3.3650, Perplexity: 28.9326
Epoch [1/3], Step [277/3236], Loss: 3.5368, Perplexity: 34.3572
Epoch [1/3], Step [278/3236], Loss: 3.6882, Perplexity: 39.9727
Epoch [1/3], Step [279/3236], Loss: 3.2979, Perplexity: 27.0564
Epoch [1/3], Step [280/3236], Loss: 3.4295, Perplexity: 30.8613
Epoch [1/3], Step [281/3236], Loss: 3.2558, Perplexity: 25.9391
Epoch [1/3], Step [282/3236], Loss: 3.2002, Perplexity: 24.5379
Epoch [1/3], Step [283/3236], Loss: 3.1996, Perplexity: 24.5239
Epoch [1/3], Step [284/3236], Loss: 3.3326, Perplexity: 28.0124
Epoch [1/3], Step [285/3236], Loss: 3.2925, Perplexity: 26.9098
Epoch [1/3], Step [286/3236], Loss: 4.3667, Perplexity: 78.7851
Epoch [1/3], Step [287/3236], Loss: 3.2756, Perplexity: 26.4595
Epoch [1/3], Step [288/3236], Loss: 3.3781, Perplexity: 29.3138
Epoch [1/3], Step [289/3236], Loss: 3.8725, Perplexity: 48.0622
Epoch [1/3], Step [290/3236], Loss: 3.1600, Perplexity: 23.5699
Epoch [1/3], Step [291/3236], Loss: 3.2578, Perplexity: 25.9911
Epoch [1/3], Step [292/3236], Loss: 3.2839, Perplexity: 26.6805
Epoch [1/3], Step [293/3236], Loss: 3.3116, Perplexity: 27.4297
Epoch [1/3], Step [294/3236], Loss: 3.2410, Perplexity: 25.5584
Epoch [1/3], Step [295/3236], Loss: 3.6658, Perplexity: 39.0870
Epoch [1/3], Step [296/3236], Loss: 3.6371, Perplexity: 37.9807
Epoch [1/3], Step [297/3236], Loss: 3.1257, Perplexity: 22.7748
Epoch [1/3], Step [298/3236], Loss: 3.4017, Perplexity: 30.0156
Epoch [1/3], Step [299/3236], Loss: 3.3216, Perplexity: 27.7059
Epoch [1/3], Step [300/3236], Loss: 3.4821, Perplexity: 32.5270
Epoch [1/3], Step [301/3236], Loss: 3.6011, Perplexity: 36.6396
Epoch [1/3], Step [302/3236], Loss: 3.2706, Perplexity: 26.3267
Epoch [1/3], Step [303/3236], Loss: 3.4388, Perplexity: 31.1485
Epoch [1/3], Step [304/3236], Loss: 3.1290, Perplexity: 22.8500
Epoch [1/3], Step [305/3236], Loss: 3.3616, Perplexity: 28.8360
Epoch [1/3], Step [306/3236], Loss: 3.1520, Perplexity: 23.3832
Epoch [1/3], Step [307/3236], Loss: 3.1753, Perplexity: 23.9334
Epoch [1/3], Step [308/3236], Loss: 3.1078, Perplexity: 22.3718
Epoch [1/3], Step [309/3236], Loss: 3.1582, Perplexity: 23.5283
Epoch [1/3], Step [310/3236], Loss: 3.1693, Perplexity: 23.7903
Epoch [1/3], Step [311/3236], Loss: 3.2888, Perplexity: 26.8104
Epoch [1/3], Step [312/3236], Loss: 3.1068, Perplexity: 22.3491
Epoch [1/3], Step [313/3236], Loss: 3.6556, Perplexity: 38.6925
Epoch [1/3], Step [314/3236], Loss: 3.8021, Perplexity: 44.7950
Epoch [1/3], Step [315/3236], Loss: 2.9846, Perplexity: 19.7778
Epoch [1/3], Step [316/3236], Loss: 3.3065, Perplexity: 27.2895
Epoch [1/3], Step [317/3236], Loss: 3.1776, Perplexity: 23.9891
Epoch [1/3], Step [318/3236], Loss: 3.2552, Perplexity: 25.9240
Epoch [1/3], Step [319/3236], Loss: 3.2314, Perplexity: 25.3140
Epoch [1/3], Step [320/3236], Loss: 3.1281, Perplexity: 22.8305
Epoch [1/3], Step [321/3236], Loss: 3.3001, Perplexity: 27.1150
Epoch [1/3], Step [322/3236], Loss: 3.6624, Perplexity: 38.9530
Epoch [1/3], Step [323/3236], Loss: 3.0888, Perplexity: 21.9513
Epoch [1/3], Step [324/3236], Loss: 3.3333, Perplexity: 28.0308
Epoch [1/3], Step [325/3236], Loss: 3.5656, Perplexity: 35.3618
Epoch [1/3], Step [326/3236], Loss: 3.0605, Perplexity: 21.3391
Epoch [1/3], Step [327/3236], Loss: 3.3198, Perplexity: 27.6539
Epoch [1/3], Step [328/3236], Loss: 3.3437, Perplexity: 28.3227
Epoch [1/3], Step [329/3236], Loss: 3.5453, Perplexity: 34.6499
Epoch [1/3], Step [330/3236], Loss: 2.9900, Perplexity: 19.8857
Epoch [1/3], Step [331/3236], Loss: 3.2092, Perplexity: 24.7586
Epoch [1/3], Step [332/3236], Loss: 3.0464, Perplexity: 21.0395
Epoch [1/3], Step [333/3236], Loss: 3.1718, Perplexity: 23.8495
Epoch [1/3], Step [334/3236], Loss: 3.2894, Perplexity: 26.8254
Epoch [1/3], Step [335/3236], Loss: 3.2884, Perplexity: 26.7994
Epoch [1/3], Step [336/3236], Loss: 3.4100, Perplexity: 30.2653
Epoch [1/3], Step [337/3236], Loss: 2.9680, Perplexity: 19.4538
Epoch [1/3], Step [338/3236], Loss: 3.3411, Perplexity: 28.2499
Epoch [1/3], Step [339/3236], Loss: 3.3431, Perplexity: 28.3081
Epoch [1/3], Step [340/3236], Loss: 3.1640, Perplexity: 23.6641
Epoch [1/3], Step [341/3236], Loss: 3.1497, Perplexity: 23.3300
Epoch [1/3], Step [342/3236], Loss: 3.1650, Perplexity: 23.6879
Epoch [1/3], Step [343/3236], Loss: 3.4794, Perplexity: 32.4388
Epoch [1/3], Step [344/3236], Loss: 3.1256, Perplexity: 22.7747
Epoch [1/3], Step [345/3236], Loss: 3.1434, Perplexity: 23.1815
Epoch [1/3], Step [346/3236], Loss: 3.2010, Perplexity: 24.5563
Epoch [1/3], Step [347/3236], Loss: 3.2319, Perplexity: 25.3286
Epoch [1/3], Step [348/3236], Loss: 3.1429, Perplexity: 23.1707
Epoch [1/3], Step [349/3236], Loss: 3.1817, Perplexity: 24.0882
Epoch [1/3], Step [350/3236], Loss: 3.1160, Perplexity: 22.5564
Epoch [1/3], Step [351/3236], Loss: 3.0765, Perplexity: 21.6824
Epoch [1/3], Step [352/3236], Loss: 3.2364, Perplexity: 25.4422
Epoch [1/3], Step [353/3236], Loss: 3.2289, Perplexity: 25.2524
Epoch [1/3], Step [354/3236], Loss: 3.1205, Perplexity: 22.6588
Epoch [1/3], Step [355/3236], Loss: 3.1027, Perplexity: 22.2581
Epoch [1/3], Step [356/3236], Loss: 3.5912, Perplexity: 36.2771
Epoch [1/3], Step [357/3236], Loss: 2.9628, Perplexity: 19.3525
Epoch [1/3], Step [358/3236], Loss: 3.2738, Perplexity: 26.4116
Epoch [1/3], Step [359/3236], Loss: 3.0144, Perplexity: 20.3760
Epoch [1/3], Step [360/3236], Loss: 3.3370, Perplexity: 28.1338
Epoch [1/3], Step [361/3236], Loss: 3.6097, Perplexity: 36.9543
Epoch [1/3], Step [362/3236], Loss: 4.1361, Perplexity: 62.5603
Epoch [1/3], Step [363/3236], Loss: 3.2135, Perplexity: 24.8667
Epoch [1/3], Step [364/3236], Loss: 3.1477, Perplexity: 23.2817
Epoch [1/3], Step [365/3236], Loss: 3.1098, Perplexity: 22.4169
Epoch [1/3], Step [366/3236], Loss: 3.5014, Perplexity: 33.1625
Epoch [1/3], Step [367/3236], Loss: 3.1423, Perplexity: 23.1576
Epoch [1/3], Step [368/3236], Loss: 3.1676, Perplexity: 23.7497
Epoch [1/3], Step [369/3236], Loss: 3.1668, Perplexity: 23.7323
Epoch [1/3], Step [370/3236], Loss: 3.1544, Perplexity: 23.4387
Epoch [1/3], Step [371/3236], Loss: 3.1611, Perplexity: 23.5974
Epoch [1/3], Step [372/3236], Loss: 3.3032, Perplexity: 27.1982
Epoch [1/3], Step [373/3236], Loss: 3.0168, Perplexity: 20.4254
Epoch [1/3], Step [374/3236], Loss: 3.1781, Perplexity: 24.0006
Epoch [1/3], Step [375/3236], Loss: 3.5232, Perplexity: 33.8925
Epoch [1/3], Step [376/3236], Loss: 3.3302, Perplexity: 27.9427
Epoch [1/3], Step [377/3236], Loss: 3.1141, Perplexity: 22.5141
Epoch [1/3], Step [378/3236], Loss: 3.2201, Perplexity: 25.0317
Epoch [1/3], Step [379/3236], Loss: 3.0570, Perplexity: 21.2637
Epoch [1/3], Step [380/3236], Loss: 2.9481, Perplexity: 19.0690
Epoch [1/3], Step [381/3236], Loss: 2.9560, Perplexity: 19.2212
Epoch [1/3], Step [382/3236], Loss: 3.1729, Perplexity: 23.8764
Epoch [1/3], Step [383/3236], Loss: 3.1201, Perplexity: 22.6483
Epoch [1/3], Step [384/3236], Loss: 3.0398, Perplexity: 20.9005
Epoch [1/3], Step [385/3236], Loss: 3.0514, Perplexity: 21.1448
Epoch [1/3], Step [386/3236], Loss: 3.5376, Perplexity: 34.3857
Epoch [1/3], Step [387/3236], Loss: 3.0499, Perplexity: 21.1125
Epoch [1/3], Step [388/3236], Loss: 3.3303, Perplexity: 27.9466
Epoch [1/3], Step [389/3236], Loss: 3.0486, Perplexity: 21.0848
Epoch [1/3], Step [390/3236], Loss: 3.1157, Perplexity: 22.5492
Epoch [1/3], Step [391/3236], Loss: 3.2300, Perplexity: 25.2809
Epoch [1/3], Step [392/3236], Loss: 3.4425, Perplexity: 31.2656
Epoch [1/3], Step [393/3236], Loss: 3.4952, Perplexity: 32.9579
Epoch [1/3], Step [394/3236], Loss: 3.1333, Perplexity: 22.9488
Epoch [1/3], Step [395/3236], Loss: 3.1759, Perplexity: 23.9485
Epoch [1/3], Step [396/3236], Loss: 3.0177, Perplexity: 20.4446
Epoch [1/3], Step [397/3236], Loss: 3.0266, Perplexity: 20.6265
Epoch [1/3], Step [398/3236], Loss: 3.2784, Perplexity: 26.5333
Epoch [1/3], Step [399/3236], Loss: 3.1246, Perplexity: 22.7508
Epoch [1/3], Step [400/3236], Loss: 3.4116, Perplexity: 30.3150
Epoch [1/3], Step [401/3236], Loss: 3.1074, Perplexity: 22.3622
Epoch [1/3], Step [402/3236], Loss: 2.9472, Perplexity: 19.0533
Epoch [1/3], Step [403/3236], Loss: 3.0730, Perplexity: 21.6065
Epoch [1/3], Step [404/3236], Loss: 3.1206, Perplexity: 22.6605
Epoch [1/3], Step [405/3236], Loss: 3.7684, Perplexity: 43.3104
Epoch [1/3], Step [406/3236], Loss: 3.1784, Perplexity: 24.0088
Epoch [1/3], Step [407/3236], Loss: 3.0108, Perplexity: 20.3044
Epoch [1/3], Step [408/3236], Loss: 3.1128, Perplexity: 22.4847
Epoch [1/3], Step [409/3236], Loss: 3.4393, Perplexity: 31.1666
Epoch [1/3], Step [410/3236], Loss: 3.4331, Perplexity: 30.9712
Epoch [1/3], Step [411/3236], Loss: 3.1036, Perplexity: 22.2775
Epoch [1/3], Step [412/3236], Loss: 3.0703, Perplexity: 21.5475
Epoch [1/3], Step [413/3236], Loss: 3.5323, Perplexity: 34.2016
Epoch [1/3], Step [414/3236], Loss: 3.1197, Perplexity: 22.6394
Epoch [1/3], Step [415/3236], Loss: 3.0489, Perplexity: 21.0917
Epoch [1/3], Step [416/3236], Loss: 3.0950, Perplexity: 22.0881
Epoch [1/3], Step [417/3236], Loss: 3.0687, Perplexity: 21.5144
Epoch [1/3], Step [418/3236], Loss: 3.5412, Perplexity: 34.5074
Epoch [1/3], Step [419/3236], Loss: 2.9700, Perplexity: 19.4912
Epoch [1/3], Step [420/3236], Loss: 3.3830, Perplexity: 29.4598
Epoch [1/3], Step [421/3236], Loss: 3.4762, Perplexity: 32.3350
Epoch [1/3], Step [422/3236], Loss: 3.0614, Perplexity: 21.3581
Epoch [1/3], Step [423/3236], Loss: 3.5265, Perplexity: 34.0060
Epoch [1/3], Step [424/3236], Loss: 3.2147, Perplexity: 24.8948
Epoch [1/3], Step [425/3236], Loss: 3.0804, Perplexity: 21.7680
Epoch [1/3], Step [426/3236], Loss: 3.0369, Perplexity: 20.8400
Epoch [1/3], Step [427/3236], Loss: 3.2561, Perplexity: 25.9491
Epoch [1/3], Step [428/3236], Loss: 3.1201, Perplexity: 22.6487
Epoch [1/3], Step [429/3236], Loss: 3.1190, Perplexity: 22.6235
Epoch [1/3], Step [430/3236], Loss: 3.1496, Perplexity: 23.3268
Epoch [1/3], Step [431/3236], Loss: 4.1466, Perplexity: 63.2178
Epoch [1/3], Step [432/3236], Loss: 2.9780, Perplexity: 19.6477
Epoch [1/3], Step [433/3236], Loss: 3.2421, Perplexity: 25.5876
Epoch [1/3], Step [434/3236], Loss: 3.3828, Perplexity: 29.4518
Epoch [1/3], Step [435/3236], Loss: 3.1552, Perplexity: 23.4586
Epoch [1/3], Step [436/3236], Loss: 3.0901, Perplexity: 21.9802
Epoch [1/3], Step [437/3236], Loss: 3.0624, Perplexity: 21.3789
Epoch [1/3], Step [438/3236], Loss: 3.4518, Perplexity: 31.5581
Epoch [1/3], Step [439/3236], Loss: 3.0874, Perplexity: 21.9208
Epoch [1/3], Step [440/3236], Loss: 2.9741, Perplexity: 19.5724
Epoch [1/3], Step [441/3236], Loss: 3.2363, Perplexity: 25.4387
Epoch [1/3], Step [442/3236], Loss: 3.1155, Perplexity: 22.5443
Epoch [1/3], Step [443/3236], Loss: 3.0922, Perplexity: 22.0262
Epoch [1/3], Step [444/3236], Loss: 3.1082, Perplexity: 22.3799
Epoch [1/3], Step [445/3236], Loss: 3.0524, Perplexity: 21.1656
Epoch [1/3], Step [446/3236], Loss: 3.3099, Perplexity: 27.3823
Epoch [1/3], Step [447/3236], Loss: 3.1615, Perplexity: 23.6067
Epoch [1/3], Step [448/3236], Loss: 3.1511, Perplexity: 23.3608
Epoch [1/3], Step [449/3236], Loss: 3.8127, Perplexity: 45.2712
Epoch [1/3], Step [450/3236], Loss: 3.1610, Perplexity: 23.5941
Epoch [1/3], Step [451/3236], Loss: 3.1364, Perplexity: 23.0209
Epoch [1/3], Step [452/3236], Loss: 3.0966, Perplexity: 22.1226
Epoch [1/3], Step [453/3236], Loss: 3.0470, Perplexity: 21.0519
Epoch [1/3], Step [454/3236], Loss: 3.1443, Perplexity: 23.2043
Epoch [1/3], Step [455/3236], Loss: 3.1583, Perplexity: 23.5310
Epoch [1/3], Step [456/3236], Loss: 3.1207, Perplexity: 22.6621
Epoch [1/3], Step [457/3236], Loss: 2.9960, Perplexity: 20.0053
Epoch [1/3], Step [458/3236], Loss: 3.0808, Perplexity: 21.7752
Epoch [1/3], Step [459/3236], Loss: 3.1479, Perplexity: 23.2871
Epoch [1/3], Step [460/3236], Loss: 3.1056, Perplexity: 22.3236
Epoch [1/3], Step [461/3236], Loss: 2.9852, Perplexity: 19.7896
Epoch [1/3], Step [462/3236], Loss: 3.6584, Perplexity: 38.7978
Epoch [1/3], Step [463/3236], Loss: 3.5720, Perplexity: 35.5878
Epoch [1/3], Step [464/3236], Loss: 3.1046, Perplexity: 22.3010
Epoch [1/3], Step [465/3236], Loss: 2.9381, Perplexity: 18.8800
Epoch [1/3], Step [466/3236], Loss: 3.1408, Perplexity: 23.1226
Epoch [1/3], Step [467/3236], Loss: 3.1688, Perplexity: 23.7789
Epoch [1/3], Step [468/3236], Loss: 3.0264, Perplexity: 20.6235
Epoch [1/3], Step [469/3236], Loss: 3.0733, Perplexity: 21.6135
Epoch [1/3], Step [470/3236], Loss: 2.9303, Perplexity: 18.7341
Epoch [1/3], Step [471/3236], Loss: 3.0341, Perplexity: 20.7817
Epoch [1/3], Step [472/3236], Loss: 3.0161, Perplexity: 20.4106
Epoch [1/3], Step [473/3236], Loss: 3.4105, Perplexity: 30.2800
Epoch [1/3], Step [474/3236], Loss: 2.9473, Perplexity: 19.0537
Epoch [1/3], Step [475/3236], Loss: 3.3175, Perplexity: 27.5909
Epoch [1/3], Step [476/3236], Loss: 3.1980, Perplexity: 24.4840
Epoch [1/3], Step [477/3236], Loss: 3.2965, Perplexity: 27.0188
Epoch [1/3], Step [478/3236], Loss: 2.9702, Perplexity: 19.4968
Epoch [1/3], Step [479/3236], Loss: 2.9910, Perplexity: 19.9048
Epoch [1/3], Step [480/3236], Loss: 3.0733, Perplexity: 21.6123
Epoch [1/3], Step [481/3236], Loss: 2.9957, Perplexity: 19.9986
Epoch [1/3], Step [482/3236], Loss: 2.9465, Perplexity: 19.0384
Epoch [1/3], Step [483/3236], Loss: 3.1901, Perplexity: 24.2908
Epoch [1/3], Step [484/3236], Loss: 2.9553, Perplexity: 19.2075
Epoch [1/3], Step [485/3236], Loss: 3.5860, Perplexity: 36.0909
Epoch [1/3], Step [486/3236], Loss: 2.8586, Perplexity: 17.4375
Epoch [1/3], Step [487/3236], Loss: 3.1194, Perplexity: 22.6323
Epoch [1/3], Step [488/3236], Loss: 2.9607, Perplexity: 19.3122
Epoch [1/3], Step [489/3236], Loss: 3.0469, Perplexity: 21.0490
Epoch [1/3], Step [490/3236], Loss: 3.1152, Perplexity: 22.5378
Epoch [1/3], Step [491/3236], Loss: 2.9925, Perplexity: 19.9360
Epoch [1/3], Step [492/3236], Loss: 3.1353, Perplexity: 22.9959
Epoch [1/3], Step [493/3236], Loss: 2.9949, Perplexity: 19.9827
Epoch [1/3], Step [494/3236], Loss: 3.4974, Perplexity: 33.0299
Epoch [1/3], Step [495/3236], Loss: 3.1230, Perplexity: 22.7136
Epoch [1/3], Step [496/3236], Loss: 3.0900, Perplexity: 21.9768
Epoch [1/3], Step [497/3236], Loss: 3.0003, Perplexity: 20.0919
Epoch [1/3], Step [498/3236], Loss: 3.1760, Perplexity: 23.9504
Epoch [1/3], Step [499/3236], Loss: 3.0677, Perplexity: 21.4928
Epoch [1/3], Step [500/3236], Loss: 3.0800, Perplexity: 21.7584
Epoch [1/3], Step [501/3236], Loss: 3.0939, Perplexity: 22.0623
Epoch [1/3], Step [502/3236], Loss: 3.0008, Perplexity: 20.1022
Epoch [1/3], Step [503/3236], Loss: 3.0358, Perplexity: 20.8179
Epoch [1/3], Step [504/3236], Loss: 2.9169, Perplexity: 18.4846
Epoch [1/3], Step [505/3236], Loss: 3.1302, Perplexity: 22.8779
Epoch [1/3], Step [506/3236], Loss: 3.9527, Perplexity: 52.0742
Epoch [1/3], Step [507/3236], Loss: 3.2293, Perplexity: 25.2632
Epoch [1/3], Step [508/3236], Loss: 3.0110, Perplexity: 20.3085
Epoch [1/3], Step [509/3236], Loss: 2.9459, Perplexity: 19.0286
Epoch [1/3], Step [510/3236], Loss: 2.9159, Perplexity: 18.4645
Epoch [1/3], Step [511/3236], Loss: 3.1437, Perplexity: 23.1894
Epoch [1/3], Step [512/3236], Loss: 3.2804, Perplexity: 26.5874
Epoch [1/3], Step [513/3236], Loss: 2.9585, Perplexity: 19.2683
Epoch [1/3], Step [514/3236], Loss: 3.0230, Perplexity: 20.5525
Epoch [1/3], Step [515/3236], Loss: 3.2876, Perplexity: 26.7782
Epoch [1/3], Step [516/3236], Loss: 3.4645, Perplexity: 31.9599
Epoch [1/3], Step [517/3236], Loss: 3.1826, Perplexity: 24.1101
Epoch [1/3], Step [518/3236], Loss: 3.1782, Perplexity: 24.0029
Epoch [1/3], Step [519/3236], Loss: 3.0891, Perplexity: 21.9582
Epoch [1/3], Step [520/3236], Loss: 3.1541, Perplexity: 23.4323
Epoch [1/3], Step [521/3236], Loss: 3.0813, Perplexity: 21.7859
Epoch [1/3], Step [522/3236], Loss: 3.1789, Perplexity: 24.0197
Epoch [1/3], Step [523/3236], Loss: 2.8264, Perplexity: 16.8848
Epoch [1/3], Step [524/3236], Loss: 2.9061, Perplexity: 18.2849
Epoch [1/3], Step [525/3236], Loss: 3.1467, Perplexity: 23.2596
Epoch [1/3], Step [526/3236], Loss: 3.4812, Perplexity: 32.4987
Epoch [1/3], Step [527/3236], Loss: 3.0363, Perplexity: 20.8286
Epoch [1/3], Step [528/3236], Loss: 3.5031, Perplexity: 33.2176
Epoch [1/3], Step [529/3236], Loss: 2.9263, Perplexity: 18.6584
Epoch [1/3], Step [530/3236], Loss: 2.9320, Perplexity: 18.7654
Epoch [1/3], Step [531/3236], Loss: 2.9984, Perplexity: 20.0525
Epoch [1/3], Step [532/3236], Loss: 3.1138, Perplexity: 22.5062
Epoch [1/3], Step [533/3236], Loss: 3.0449, Perplexity: 21.0083
Epoch [1/3], Step [534/3236], Loss: 3.0136, Perplexity: 20.3596
Epoch [1/3], Step [535/3236], Loss: 2.9687, Perplexity: 19.4673
Epoch [1/3], Step [536/3236], Loss: 3.4198, Perplexity: 30.5633
Epoch [1/3], Step [537/3236], Loss: 3.0025, Perplexity: 20.1353
Epoch [1/3], Step [538/3236], Loss: 3.3277, Perplexity: 27.8741
Epoch [1/3], Step [539/3236], Loss: 2.9712, Perplexity: 19.5162
Epoch [1/3], Step [540/3236], Loss: 3.8427, Perplexity: 46.6497
Epoch [1/3], Step [541/3236], Loss: 3.0902, Perplexity: 21.9811
Epoch [1/3], Step [542/3236], Loss: 2.9875, Perplexity: 19.8352
Epoch [1/3], Step [543/3236], Loss: 2.8919, Perplexity: 18.0270
Epoch [1/3], Step [544/3236], Loss: 3.1203, Perplexity: 22.6531
Epoch [1/3], Step [545/3236], Loss: 3.2260, Perplexity: 25.1793
Epoch [1/3], Step [546/3236], Loss: 2.8795, Perplexity: 17.8048
Epoch [1/3], Step [547/3236], Loss: 3.1421, Perplexity: 23.1535
Epoch [1/3], Step [548/3236], Loss: 3.2462, Perplexity: 25.6928
Epoch [1/3], Step [549/3236], Loss: 2.9336, Perplexity: 18.7958
Epoch [1/3], Step [550/3236], Loss: 3.0160, Perplexity: 20.4095
Epoch [1/3], Step [551/3236], Loss: 3.1025, Perplexity: 22.2536
Epoch [1/3], Step [552/3236], Loss: 2.8142, Perplexity: 16.6805
Epoch [1/3], Step [553/3236], Loss: 3.5713, Perplexity: 35.5616
Epoch [1/3], Step [554/3236], Loss: 3.1244, Perplexity: 22.7460
Epoch [1/3], Step [555/3236], Loss: 3.0117, Perplexity: 20.3219
Epoch [1/3], Step [556/3236], Loss: 2.9879, Perplexity: 19.8432
Epoch [1/3], Step [557/3236], Loss: 2.8779, Perplexity: 17.7768
Epoch [1/3], Step [558/3236], Loss: 2.8986, Perplexity: 18.1480
Epoch [1/3], Step [559/3236], Loss: 2.7934, Perplexity: 16.3362
Epoch [1/3], Step [560/3236], Loss: 2.9963, Perplexity: 20.0105
Epoch [1/3], Step [561/3236], Loss: 2.9526, Perplexity: 19.1560
Epoch [1/3], Step [562/3236], Loss: 2.8890, Perplexity: 17.9754
Epoch [1/3], Step [563/3236], Loss: 2.9882, Perplexity: 19.8495
Epoch [1/3], Step [564/3236], Loss: 2.8859, Perplexity: 17.9199
Epoch [1/3], Step [565/3236], Loss: 3.8435, Perplexity: 46.6863
Epoch [1/3], Step [566/3236], Loss: 2.9183, Perplexity: 18.5089
Epoch [1/3], Step [567/3236], Loss: 3.0474, Perplexity: 21.0600
Epoch [1/3], Step [568/3236], Loss: 3.2833, Perplexity: 26.6645
Epoch [1/3], Step [569/3236], Loss: 4.4773, Perplexity: 87.9936
Epoch [1/3], Step [570/3236], Loss: 2.8652, Perplexity: 17.5522
Epoch [1/3], Step [571/3236], Loss: 3.2342, Perplexity: 25.3850
Epoch [1/3], Step [572/3236], Loss: 3.0240, Perplexity: 20.5729
Epoch [1/3], Step [573/3236], Loss: 3.3864, Perplexity: 29.5593
Epoch [1/3], Step [574/3236], Loss: 2.9193, Perplexity: 18.5291
Epoch [1/3], Step [575/3236], Loss: 2.9710, Perplexity: 19.5107
Epoch [1/3], Step [576/3236], Loss: 3.0918, Perplexity: 22.0158
Epoch [1/3], Step [577/3236], Loss: 2.9476, Perplexity: 19.0601
Epoch [1/3], Step [578/3236], Loss: 2.9817, Perplexity: 19.7210
Epoch [1/3], Step [579/3236], Loss: 2.9139, Perplexity: 18.4287
Epoch [1/3], Step [580/3236], Loss: 3.2511, Perplexity: 25.8196
Epoch [1/3], Step [581/3236], Loss: 2.9027, Perplexity: 18.2238
Epoch [1/3], Step [582/3236], Loss: 2.9470, Perplexity: 19.0484
Epoch [1/3], Step [583/3236], Loss: 2.8105, Perplexity: 16.6181
Epoch [1/3], Step [584/3236], Loss: 2.9452, Perplexity: 19.0153
Epoch [1/3], Step [585/3236], Loss: 3.0447, Perplexity: 21.0040
Epoch [1/3], Step [586/3236], Loss: 2.9806, Perplexity: 19.6996
Epoch [1/3], Step [587/3236], Loss: 2.8715, Perplexity: 17.6631
Epoch [1/3], Step [588/3236], Loss: 3.2610, Perplexity: 26.0746
Epoch [1/3], Step [589/3236], Loss: 3.4100, Perplexity: 30.2646
Epoch [1/3], Step [590/3236], Loss: 3.0802, Perplexity: 21.7618
Epoch [1/3], Step [591/3236], Loss: 2.9501, Perplexity: 19.1083
Epoch [1/3], Step [592/3236], Loss: 2.8208, Perplexity: 16.7906
Epoch [1/3], Step [593/3236], Loss: 2.9402, Perplexity: 18.9191
Epoch [1/3], Step [594/3236], Loss: 3.0073, Perplexity: 20.2325
Epoch [1/3], Step [595/3236], Loss: 2.7580, Perplexity: 15.7677
Epoch [1/3], Step [596/3236], Loss: 3.3897, Perplexity: 29.6574
Epoch [1/3], Step [597/3236], Loss: 2.9942, Perplexity: 19.9688
Epoch [1/3], Step [598/3236], Loss: 3.0328, Perplexity: 20.7559
Epoch [1/3], Step [599/3236], Loss: 3.4508, Perplexity: 31.5253
Epoch [1/3], Step [600/3236], Loss: 2.9942, Perplexity: 19.9688
Epoch [1/3], Step [601/3236], Loss: 2.9413, Perplexity: 18.9396
Epoch [1/3], Step [602/3236], Loss: 2.8558, Perplexity: 17.3886
Epoch [1/3], Step [603/3236], Loss: 3.3165, Perplexity: 27.5646
Epoch [1/3], Step [604/3236], Loss: 2.7656, Perplexity: 15.8882
Epoch [1/3], Step [605/3236], Loss: 3.0124, Perplexity: 20.3353
Epoch [1/3], Step [606/3236], Loss: 2.7980, Perplexity: 16.4119
Epoch [1/3], Step [607/3236], Loss: 2.9225, Perplexity: 18.5884
Epoch [1/3], Step [608/3236], Loss: 2.9617, Perplexity: 19.3315
Epoch [1/3], Step [609/3236], Loss: 2.8172, Perplexity: 16.7297
Epoch [1/3], Step [610/3236], Loss: 2.9362, Perplexity: 18.8445
Epoch [1/3], Step [611/3236], Loss: 2.9906, Perplexity: 19.8972
Epoch [1/3], Step [612/3236], Loss: 3.0829, Perplexity: 21.8217
Epoch [1/3], Step [613/3236], Loss: 2.9463, Perplexity: 19.0359
Epoch [1/3], Step [614/3236], Loss: 3.3494, Perplexity: 28.4867
Epoch [1/3], Step [615/3236], Loss: 2.9347, Perplexity: 18.8153
Epoch [1/3], Step [616/3236], Loss: 2.9170, Perplexity: 18.4858
Epoch [1/3], Step [617/3236], Loss: 2.8171, Perplexity: 16.7289
Epoch [1/3], Step [618/3236], Loss: 3.3097, Perplexity: 27.3763
Epoch [1/3], Step [619/3236], Loss: 3.0594, Perplexity: 21.3148
Epoch [1/3], Step [620/3236], Loss: 3.0689, Perplexity: 21.5177
Epoch [1/3], Step [621/3236], Loss: 2.8181, Perplexity: 16.7444
Epoch [1/3], Step [622/3236], Loss: 2.7488, Perplexity: 15.6234
Epoch [1/3], Step [623/3236], Loss: 3.1689, Perplexity: 23.7822
Epoch [1/3], Step [624/3236], Loss: 3.2271, Perplexity: 25.2074
Epoch [1/3], Step [625/3236], Loss: 3.0476, Perplexity: 21.0643
Epoch [1/3], Step [626/3236], Loss: 2.7379, Perplexity: 15.4545
Epoch [1/3], Step [627/3236], Loss: 3.0450, Perplexity: 21.0090
Epoch [1/3], Step [628/3236], Loss: 2.7015, Perplexity: 14.9015
Epoch [1/3], Step [629/3236], Loss: 2.9306, Perplexity: 18.7387
Epoch [1/3], Step [630/3236], Loss: 2.9974, Perplexity: 20.0334
Epoch [1/3], Step [631/3236], Loss: 2.7618, Perplexity: 15.8289
Epoch [1/3], Step [632/3236], Loss: 2.8049, Perplexity: 16.5248
Epoch [1/3], Step [633/3236], Loss: 2.8988, Perplexity: 18.1529
Epoch [1/3], Step [634/3236], Loss: 2.9011, Perplexity: 18.1941
Epoch [1/3], Step [635/3236], Loss: 2.8508, Perplexity: 17.3012
Epoch [1/3], Step [636/3236], Loss: 2.8998, Perplexity: 18.1698
Epoch [1/3], Step [637/3236], Loss: 2.8459, Perplexity: 17.2169
Epoch [1/3], Step [638/3236], Loss: 2.8675, Perplexity: 17.5933
Epoch [1/3], Step [639/3236], Loss: 2.7746, Perplexity: 16.0324
Epoch [1/3], Step [640/3236], Loss: 2.9955, Perplexity: 19.9944
Epoch [1/3], Step [641/3236], Loss: 2.9334, Perplexity: 18.7917
Epoch [1/3], Step [642/3236], Loss: 2.9671, Perplexity: 19.4360
Epoch [1/3], Step [643/3236], Loss: 2.9481, Perplexity: 19.0706
Epoch [1/3], Step [644/3236], Loss: 2.9479, Perplexity: 19.0659
Epoch [1/3], Step [645/3236], Loss: 2.8540, Perplexity: 17.3573
Epoch [1/3], Step [646/3236], Loss: 3.0791, Perplexity: 21.7384
Epoch [1/3], Step [647/3236], Loss: 3.0533, Perplexity: 21.1858
Epoch [1/3], Step [648/3236], Loss: 3.1282, Perplexity: 22.8320
Epoch [1/3], Step [649/3236], Loss: 2.8756, Perplexity: 17.7362
Epoch [1/3], Step [650/3236], Loss: 2.8801, Perplexity: 17.8166
Epoch [1/3], Step [651/3236], Loss: 3.0903, Perplexity: 21.9840
Epoch [1/3], Step [652/3236], Loss: 2.8630, Perplexity: 17.5147
Epoch [1/3], Step [653/3236], Loss: 2.9622, Perplexity: 19.3406
Epoch [1/3], Step [654/3236], Loss: 3.0691, Perplexity: 21.5218
Epoch [1/3], Step [655/3236], Loss: 2.7437, Perplexity: 15.5441
Epoch [1/3], Step [656/3236], Loss: 2.8703, Perplexity: 17.6420
Epoch [1/3], Step [657/3236], Loss: 2.9519, Perplexity: 19.1418
Epoch [1/3], Step [658/3236], Loss: 3.1635, Perplexity: 23.6536
Epoch [1/3], Step [659/3236], Loss: 3.1518, Perplexity: 23.3776
Epoch [1/3], Step [660/3236], Loss: 3.1556, Perplexity: 23.4682
Epoch [1/3], Step [661/3236], Loss: 3.2933, Perplexity: 26.9320
Epoch [1/3], Step [662/3236], Loss: 2.8546, Perplexity: 17.3682
Epoch [1/3], Step [663/3236], Loss: 2.8613, Perplexity: 17.4843
Epoch [1/3], Step [664/3236], Loss: 2.7840, Perplexity: 16.1838
Epoch [1/3], Step [665/3236], Loss: 3.0714, Perplexity: 21.5728
Epoch [1/3], Step [666/3236], Loss: 2.7099, Perplexity: 15.0271
Epoch [1/3], Step [667/3236], Loss: 2.8601, Perplexity: 17.4624
Epoch [1/3], Step [668/3236], Loss: 2.8133, Perplexity: 16.6645
Epoch [1/3], Step [669/3236], Loss: 3.0624, Perplexity: 21.3783
Epoch [1/3], Step [670/3236], Loss: 3.8266, Perplexity: 45.9071
Epoch [1/3], Step [671/3236], Loss: 2.7720, Perplexity: 15.9910
Epoch [1/3], Step [672/3236], Loss: 2.9454, Perplexity: 19.0189
Epoch [1/3], Step [673/3236], Loss: 3.0974, Perplexity: 22.1413
Epoch [1/3], Step [674/3236], Loss: 3.2419, Perplexity: 25.5814
Epoch [1/3], Step [675/3236], Loss: 3.0028, Perplexity: 20.1423
Epoch [1/3], Step [676/3236], Loss: 2.8485, Perplexity: 17.2625
Epoch [1/3], Step [677/3236], Loss: 2.9139, Perplexity: 18.4288
Epoch [1/3], Step [678/3236], Loss: 2.8053, Perplexity: 16.5313
Epoch [1/3], Step [679/3236], Loss: 2.7949, Perplexity: 16.3617
Epoch [1/3], Step [680/3236], Loss: 3.0280, Perplexity: 20.6551
Epoch [1/3], Step [681/3236], Loss: 2.8963, Perplexity: 18.1063
Epoch [1/3], Step [682/3236], Loss: 2.8430, Perplexity: 17.1670
Epoch [1/3], Step [683/3236], Loss: 3.0051, Perplexity: 20.1873
Epoch [1/3], Step [684/3236], Loss: 2.9370, Perplexity: 18.8592
Epoch [1/3], Step [685/3236], Loss: 2.8689, Perplexity: 17.6183
Epoch [1/3], Step [686/3236], Loss: 2.9661, Perplexity: 19.4154
Epoch [1/3], Step [687/3236], Loss: 2.8460, Perplexity: 17.2189
Epoch [1/3], Step [688/3236], Loss: 3.2257, Perplexity: 25.1721
Epoch [1/3], Step [689/3236], Loss: 2.9396, Perplexity: 18.9083
Epoch [1/3], Step [690/3236], Loss: 3.1666, Perplexity: 23.7260
Epoch [1/3], Step [691/3236], Loss: 2.9632, Perplexity: 19.3601
Epoch [1/3], Step [692/3236], Loss: 2.8601, Perplexity: 17.4634
Epoch [1/3], Step [693/3236], Loss: 3.0026, Perplexity: 20.1380
Epoch [1/3], Step [694/3236], Loss: 3.0516, Perplexity: 21.1501
Epoch [1/3], Step [695/3236], Loss: 2.7244, Perplexity: 15.2469
Epoch [1/3], Step [696/3236], Loss: 2.8079, Perplexity: 16.5743
Epoch [1/3], Step [697/3236], Loss: 3.2503, Perplexity: 25.7986
Epoch [1/3], Step [698/3236], Loss: 2.8588, Perplexity: 17.4413
Epoch [1/3], Step [699/3236], Loss: 2.8542, Perplexity: 17.3608
Epoch [1/3], Step [700/3236], Loss: 2.8190, Perplexity: 16.7596
Epoch [1/3], Step [701/3236], Loss: 2.9370, Perplexity: 18.8595
Epoch [1/3], Step [702/3236], Loss: 2.7469, Perplexity: 15.5940
Epoch [1/3], Step [703/3236], Loss: 2.8964, Perplexity: 18.1096
Epoch [1/3], Step [704/3236], Loss: 2.7126, Perplexity: 15.0688
Epoch [1/3], Step [705/3236], Loss: 2.9884, Perplexity: 19.8536
Epoch [1/3], Step [706/3236], Loss: 2.7584, Perplexity: 15.7743
Epoch [1/3], Step [707/3236], Loss: 2.6948, Perplexity: 14.8025
Epoch [1/3], Step [708/3236], Loss: 2.7887, Perplexity: 16.2600
Epoch [1/3], Step [709/3236], Loss: 3.2120, Perplexity: 24.8277
Epoch [1/3], Step [710/3236], Loss: 3.1579, Perplexity: 23.5215
Epoch [1/3], Step [711/3236], Loss: 2.9175, Perplexity: 18.4956
Epoch [1/3], Step [712/3236], Loss: 2.9368, Perplexity: 18.8557
Epoch [1/3], Step [713/3236], Loss: 2.7751, Perplexity: 16.0400
Epoch [1/3], Step [714/3236], Loss: 2.8900, Perplexity: 17.9932
Epoch [1/3], Step [715/3236], Loss: 2.5804, Perplexity: 13.2021
Epoch [1/3], Step [716/3236], Loss: 3.2162, Perplexity: 24.9323
Epoch [1/3], Step [717/3236], Loss: 2.8588, Perplexity: 17.4400
Epoch [1/3], Step [718/3236], Loss: 2.8255, Perplexity: 16.8691
Epoch [1/3], Step [719/3236], Loss: 2.7775, Perplexity: 16.0786
Epoch [1/3], Step [720/3236], Loss: 2.9047, Perplexity: 18.2596
Epoch [1/3], Step [721/3236], Loss: 3.4832, Perplexity: 32.5625
Epoch [1/3], Step [722/3236], Loss: 2.6584, Perplexity: 14.2733
Epoch [1/3], Step [723/3236], Loss: 2.8617, Perplexity: 17.4919
Epoch [1/3], Step [724/3236], Loss: 2.8793, Perplexity: 17.8024
Epoch [1/3], Step [725/3236], Loss: 2.8429, Perplexity: 17.1648
Epoch [1/3], Step [726/3236], Loss: 2.8642, Perplexity: 17.5352
Epoch [1/3], Step [727/3236], Loss: 2.6507, Perplexity: 14.1636
Epoch [1/3], Step [728/3236], Loss: 2.8405, Perplexity: 17.1251
Epoch [1/3], Step [729/3236], Loss: 2.7849, Perplexity: 16.1989
Epoch [1/3], Step [730/3236], Loss: 2.9984, Perplexity: 20.0532
Epoch [1/3], Step [731/3236], Loss: 3.0717, Perplexity: 21.5780
Epoch [1/3], Step [732/3236], Loss: 3.1744, Perplexity: 23.9133
Epoch [1/3], Step [733/3236], Loss: 2.9438, Perplexity: 18.9876
Epoch [1/3], Step [734/3236], Loss: 2.9349, Perplexity: 18.8192
Epoch [1/3], Step [735/3236], Loss: 2.8974, Perplexity: 18.1261
Epoch [1/3], Step [736/3236], Loss: 3.1756, Perplexity: 23.9409
Epoch [1/3], Step [737/3236], Loss: 2.9030, Perplexity: 18.2294
Epoch [1/3], Step [738/3236], Loss: 3.0410, Perplexity: 20.9258
Epoch [1/3], Step [739/3236], Loss: 2.9107, Perplexity: 18.3693
Epoch [1/3], Step [740/3236], Loss: 2.8639, Perplexity: 17.5294
Epoch [1/3], Step [741/3236], Loss: 2.8510, Perplexity: 17.3046
Epoch [1/3], Step [742/3236], Loss: 2.8984, Perplexity: 18.1446
Epoch [1/3], Step [743/3236], Loss: 2.7533, Perplexity: 15.6944
Epoch [1/3], Step [744/3236], Loss: 2.8581, Perplexity: 17.4292
Epoch [1/3], Step [745/3236], Loss: 2.6544, Perplexity: 14.2168
Epoch [1/3], Step [746/3236], Loss: 2.7666, Perplexity: 15.9042
Epoch [1/3], Step [747/3236], Loss: 3.0167, Perplexity: 20.4228
Epoch [1/3], Step [748/3236], Loss: 2.7878, Perplexity: 16.2460
Epoch [1/3], Step [749/3236], Loss: 3.4333, Perplexity: 30.9782
Epoch [1/3], Step [750/3236], Loss: 2.8396, Perplexity: 17.1087
Epoch [1/3], Step [751/3236], Loss: 2.6589, Perplexity: 14.2807
Epoch [1/3], Step [752/3236], Loss: 2.8294, Perplexity: 16.9351
Epoch [1/3], Step [753/3236], Loss: 2.8863, Perplexity: 17.9264
Epoch [1/3], Step [754/3236], Loss: 2.7455, Perplexity: 15.5730
Epoch [1/3], Step [755/3236], Loss: 2.9423, Perplexity: 18.9598
Epoch [1/3], Step [756/3236], Loss: 2.7993, Perplexity: 16.4326
Epoch [1/3], Step [757/3236], Loss: 2.7714, Perplexity: 15.9808
Epoch [1/3], Step [758/3236], Loss: 2.7086, Perplexity: 15.0084
Epoch [1/3], Step [759/3236], Loss: 3.0145, Perplexity: 20.3783
Epoch [1/3], Step [760/3236], Loss: 2.6816, Perplexity: 14.6084
Epoch [1/3], Step [761/3236], Loss: 2.7803, Perplexity: 16.1240
Epoch [1/3], Step [762/3236], Loss: 2.9910, Perplexity: 19.9048
Epoch [1/3], Step [763/3236], Loss: 3.0915, Perplexity: 22.0111
Epoch [1/3], Step [764/3236], Loss: 2.8369, Perplexity: 17.0632
Epoch [1/3], Step [765/3236], Loss: 2.7336, Perplexity: 15.3888
Epoch [1/3], Step [766/3236], Loss: 2.6716, Perplexity: 14.4625
Epoch [1/3], Step [767/3236], Loss: 3.1910, Perplexity: 24.3129
Epoch [1/3], Step [768/3236], Loss: 3.0989, Perplexity: 22.1737
Epoch [1/3], Step [769/3236], Loss: 3.6215, Perplexity: 37.3942
Epoch [1/3], Step [770/3236], Loss: 2.8271, Perplexity: 16.8964
Epoch [1/3], Step [771/3236], Loss: 3.0873, Perplexity: 21.9175
Epoch [1/3], Step [772/3236], Loss: 2.8328, Perplexity: 16.9933
Epoch [1/3], Step [773/3236], Loss: 2.6572, Perplexity: 14.2564
Epoch [1/3], Step [774/3236], Loss: 2.9323, Perplexity: 18.7699
Epoch [1/3], Step [775/3236], Loss: 2.7528, Perplexity: 15.6858
Epoch [1/3], Step [776/3236], Loss: 2.7461, Perplexity: 15.5814
Epoch [1/3], Step [777/3236], Loss: 2.6572, Perplexity: 14.2561
Epoch [1/3], Step [778/3236], Loss: 2.8415, Perplexity: 17.1409
Epoch [1/3], Step [779/3236], Loss: 2.8680, Perplexity: 17.6009
Epoch [1/3], Step [780/3236], Loss: 2.8375, Perplexity: 17.0733
Epoch [1/3], Step [781/3236], Loss: 2.7944, Perplexity: 16.3533
Epoch [1/3], Step [782/3236], Loss: 2.9802, Perplexity: 19.6908
Epoch [1/3], Step [783/3236], Loss: 3.1503, Perplexity: 23.3428
Epoch [1/3], Step [784/3236], Loss: 2.7380, Perplexity: 15.4558
Epoch [1/3], Step [785/3236], Loss: 2.7434, Perplexity: 15.5394
Epoch [1/3], Step [786/3236], Loss: 3.1417, Perplexity: 23.1435
Epoch [1/3], Step [787/3236], Loss: 2.8118, Perplexity: 16.6395
Epoch [1/3], Step [788/3236], Loss: 2.7906, Perplexity: 16.2908
Epoch [1/3], Step [789/3236], Loss: 2.8635, Perplexity: 17.5225
Epoch [1/3], Step [790/3236], Loss: 2.7696, Perplexity: 15.9522
Epoch [1/3], Step [791/3236], Loss: 2.9950, Perplexity: 19.9862
Epoch [1/3], Step [792/3236], Loss: 2.6250, Perplexity: 13.8040
Epoch [1/3], Step [793/3236], Loss: 2.6860, Perplexity: 14.6724
Epoch [1/3], Step [794/3236], Loss: 3.0018, Perplexity: 20.1218
Epoch [1/3], Step [795/3236], Loss: 2.7928, Perplexity: 16.3274
Epoch [1/3], Step [796/3236], Loss: 2.9474, Perplexity: 19.0566
Epoch [1/3], Step [797/3236], Loss: 2.7798, Perplexity: 16.1164
Epoch [1/3], Step [798/3236], Loss: 2.8181, Perplexity: 16.7454
Epoch [1/3], Step [799/3236], Loss: 2.8064, Perplexity: 16.5500
Epoch [1/3], Step [800/3236], Loss: 2.8156, Perplexity: 16.7032
Epoch [1/3], Step [801/3236], Loss: 3.2878, Perplexity: 26.7837
Epoch [1/3], Step [802/3236], Loss: 3.6875, Perplexity: 39.9441
Epoch [1/3], Step [803/3236], Loss: 2.8224, Perplexity: 16.8172
Epoch [1/3], Step [804/3236], Loss: 2.8845, Perplexity: 17.8948
Epoch [1/3], Step [805/3236], Loss: 2.7631, Perplexity: 15.8492
Epoch [1/3], Step [806/3236], Loss: 3.2129, Perplexity: 24.8510
Epoch [1/3], Step [807/3236], Loss: 2.8634, Perplexity: 17.5202
Epoch [1/3], Step [808/3236], Loss: 2.8368, Perplexity: 17.0605
Epoch [1/3], Step [809/3236], Loss: 2.6505, Perplexity: 14.1605
Epoch [1/3], Step [810/3236], Loss: 2.6350, Perplexity: 13.9430
Epoch [1/3], Step [811/3236], Loss: 3.2339, Perplexity: 25.3792
Epoch [1/3], Step [812/3236], Loss: 3.0842, Perplexity: 21.8493
Epoch [1/3], Step [813/3236], Loss: 2.8599, Perplexity: 17.4600
Epoch [1/3], Step [814/3236], Loss: 2.9224, Perplexity: 18.5851
Epoch [1/3], Step [815/3236], Loss: 2.7785, Perplexity: 16.0948
Epoch [1/3], Step [816/3236], Loss: 2.7671, Perplexity: 15.9131
Epoch [1/3], Step [817/3236], Loss: 2.8843, Perplexity: 17.8916
Epoch [1/3], Step [818/3236], Loss: 2.8437, Perplexity: 17.1795
Epoch [1/3], Step [819/3236], Loss: 3.0005, Perplexity: 20.0950
Epoch [1/3], Step [820/3236], Loss: 2.9661, Perplexity: 19.4165
Epoch [1/3], Step [821/3236], Loss: 2.5831, Perplexity: 13.2382
Epoch [1/3], Step [822/3236], Loss: 2.7703, Perplexity: 15.9630
Epoch [1/3], Step [823/3236], Loss: 2.7691, Perplexity: 15.9437
Epoch [1/3], Step [824/3236], Loss: 2.6717, Perplexity: 14.4649
Epoch [1/3], Step [825/3236], Loss: 2.7571, Perplexity: 15.7543
Epoch [1/3], Step [826/3236], Loss: 2.6732, Perplexity: 14.4863
Epoch [1/3], Step [827/3236], Loss: 2.9635, Perplexity: 19.3665
Epoch [1/3], Step [828/3236], Loss: 2.8694, Perplexity: 17.6265
Epoch [1/3], Step [829/3236], Loss: 2.6933, Perplexity: 14.7810
Epoch [1/3], Step [830/3236], Loss: 2.8377, Perplexity: 17.0772
Epoch [1/3], Step [831/3236], Loss: 2.8913, Perplexity: 18.0175
Epoch [1/3], Step [832/3236], Loss: 2.9714, Perplexity: 19.5191
Epoch [1/3], Step [833/3236], Loss: 3.1124, Perplexity: 22.4754
Epoch [1/3], Step [834/3236], Loss: 2.9420, Perplexity: 18.9538
Epoch [1/3], Step [835/3236], Loss: 2.7078, Perplexity: 14.9969
Epoch [1/3], Step [836/3236], Loss: 2.8843, Perplexity: 17.8915
Epoch [1/3], Step [837/3236], Loss: 3.1328, Perplexity: 22.9388
Epoch [1/3], Step [838/3236], Loss: 2.7230, Perplexity: 15.2260
Epoch [1/3], Step [839/3236], Loss: 2.7894, Perplexity: 16.2709
Epoch [1/3], Step [840/3236], Loss: 2.7570, Perplexity: 15.7531
Epoch [1/3], Step [841/3236], Loss: 2.6709, Perplexity: 14.4525
Epoch [1/3], Step [842/3236], Loss: 2.6137, Perplexity: 13.6501
Epoch [1/3], Step [843/3236], Loss: 2.6932, Perplexity: 14.7792
Epoch [1/3], Step [844/3236], Loss: 2.8414, Perplexity: 17.1398
Epoch [1/3], Step [845/3236], Loss: 2.9753, Perplexity: 19.5946
Epoch [1/3], Step [846/3236], Loss: 2.7779, Perplexity: 16.0849
Epoch [1/3], Step [847/3236], Loss: 2.5746, Perplexity: 13.1258
Epoch [1/3], Step [848/3236], Loss: 2.6154, Perplexity: 13.6728
Epoch [1/3], Step [849/3236], Loss: 2.8934, Perplexity: 18.0550
Epoch [1/3], Step [850/3236], Loss: 2.6975, Perplexity: 14.8432
Epoch [1/3], Step [851/3236], Loss: 2.5917, Perplexity: 13.3527
Epoch [1/3], Step [852/3236], Loss: 2.6607, Perplexity: 14.3064
Epoch [1/3], Step [853/3236], Loss: 3.0781, Perplexity: 21.7162
Epoch [1/3], Step [854/3236], Loss: 2.9825, Perplexity: 19.7381
Epoch [1/3], Step [855/3236], Loss: 2.9402, Perplexity: 18.9188
Epoch [1/3], Step [856/3236], Loss: 2.9044, Perplexity: 18.2540
Epoch [1/3], Step [857/3236], Loss: 2.9254, Perplexity: 18.6421
Epoch [1/3], Step [858/3236], Loss: 2.7106, Perplexity: 15.0385
Epoch [1/3], Step [859/3236], Loss: 2.7631, Perplexity: 15.8486
Epoch [1/3], Step [860/3236], Loss: 2.7101, Perplexity: 15.0311
Epoch [1/3], Step [861/3236], Loss: 2.7946, Perplexity: 16.3555
Epoch [1/3], Step [862/3236], Loss: 2.8495, Perplexity: 17.2784
Epoch [1/3], Step [863/3236], Loss: 2.9684, Perplexity: 19.4605
Epoch [1/3], Step [864/3236], Loss: 2.7381, Perplexity: 15.4574
Epoch [1/3], Step [865/3236], Loss: 2.7907, Perplexity: 16.2918
Epoch [1/3], Step [866/3236], Loss: 2.8286, Perplexity: 16.9223
Epoch [1/3], Step [867/3236], Loss: 2.8061, Perplexity: 16.5445
Epoch [1/3], Step [868/3236], Loss: 2.8515, Perplexity: 17.3139
Epoch [1/3], Step [869/3236], Loss: 2.7597, Perplexity: 15.7949
Epoch [1/3], Step [870/3236], Loss: 2.6273, Perplexity: 13.8357
Epoch [1/3], Step [871/3236], Loss: 3.2070, Perplexity: 24.7038
Epoch [1/3], Step [872/3236], Loss: 2.6955, Perplexity: 14.8127
Epoch [1/3], Step [873/3236], Loss: 2.9698, Perplexity: 19.4887
Epoch [1/3], Step [874/3236], Loss: 2.6288, Perplexity: 13.8569
Epoch [1/3], Step [875/3236], Loss: 2.6533, Perplexity: 14.2011
Epoch [1/3], Step [876/3236], Loss: 2.6540, Perplexity: 14.2108
Epoch [1/3], Step [877/3236], Loss: 2.6584, Perplexity: 14.2732
Epoch [1/3], Step [878/3236], Loss: 2.4531, Perplexity: 11.6247
Epoch [1/3], Step [879/3236], Loss: 2.6979, Perplexity: 14.8481
Epoch [1/3], Step [880/3236], Loss: 2.7863, Perplexity: 16.2202
Epoch [1/3], Step [881/3236], Loss: 2.5734, Perplexity: 13.1101
Epoch [1/3], Step [882/3236], Loss: 2.7092, Perplexity: 15.0167
Epoch [1/3], Step [883/3236], Loss: 2.6029, Perplexity: 13.5028
Epoch [1/3], Step [884/3236], Loss: 2.6000, Perplexity: 13.4634
Epoch [1/3], Step [885/3236], Loss: 2.8409, Perplexity: 17.1314
Epoch [1/3], Step [886/3236], Loss: 2.9529, Perplexity: 19.1611
Epoch [1/3], Step [887/3236], Loss: 2.7024, Perplexity: 14.9148
Epoch [1/3], Step [888/3236], Loss: 2.8010, Perplexity: 16.4609
Epoch [1/3], Step [889/3236], Loss: 2.6451, Perplexity: 14.0847
Epoch [1/3], Step [890/3236], Loss: 2.7387, Perplexity: 15.4666
Epoch [1/3], Step [891/3236], Loss: 3.3031, Perplexity: 27.1970
Epoch [1/3], Step [892/3236], Loss: 2.6290, Perplexity: 13.8605
Epoch [1/3], Step [893/3236], Loss: 2.7524, Perplexity: 15.6806
Epoch [1/3], Step [894/3236], Loss: 2.9566, Perplexity: 19.2327
Epoch [1/3], Step [895/3236], Loss: 3.2397, Perplexity: 25.5255
Epoch [1/3], Step [896/3236], Loss: 2.7255, Perplexity: 15.2635
Epoch [1/3], Step [897/3236], Loss: 2.6741, Perplexity: 14.4987
Epoch [1/3], Step [898/3236], Loss: 2.5642, Perplexity: 12.9907
Epoch [1/3], Step [899/3236], Loss: 3.0958, Perplexity: 22.1051
Epoch [1/3], Step [900/3236], Loss: 2.6572, Perplexity: 14.2567
Epoch [1/3], Step [901/3236], Loss: 2.7184, Perplexity: 15.1567
Epoch [1/3], Step [902/3236], Loss: 2.6420, Perplexity: 14.0417
Epoch [1/3], Step [903/3236], Loss: 2.6878, Perplexity: 14.6997
Epoch [1/3], Step [904/3236], Loss: 2.6702, Perplexity: 14.4421
Epoch [1/3], Step [905/3236], Loss: 3.2185, Perplexity: 24.9904
Epoch [1/3], Step [906/3236], Loss: 2.6352, Perplexity: 13.9466
Epoch [1/3], Step [907/3236], Loss: 3.3296, Perplexity: 27.9267
Epoch [1/3], Step [908/3236], Loss: 2.9335, Perplexity: 18.7932
Epoch [1/3], Step [909/3236], Loss: 2.5777, Perplexity: 13.1673
Epoch [1/3], Step [910/3236], Loss: 2.6741, Perplexity: 14.4988
Epoch [1/3], Step [911/3236], Loss: 2.6046, Perplexity: 13.5253
Epoch [1/3], Step [912/3236], Loss: 2.5967, Perplexity: 13.4191
Epoch [1/3], Step [913/3236], Loss: 2.7080, Perplexity: 14.9989
Epoch [1/3], Step [914/3236], Loss: 2.9805, Perplexity: 19.6985
Epoch [1/3], Step [915/3236], Loss: 2.7044, Perplexity: 14.9458
Epoch [1/3], Step [916/3236], Loss: 2.5817, Perplexity: 13.2192
Epoch [1/3], Step [917/3236], Loss: 2.7095, Perplexity: 15.0225
Epoch [1/3], Step [918/3236], Loss: 2.8169, Perplexity: 16.7252
Epoch [1/3], Step [919/3236], Loss: 2.5971, Perplexity: 13.4241
Epoch [1/3], Step [920/3236], Loss: 2.6459, Perplexity: 14.0959
Epoch [1/3], Step [921/3236], Loss: 2.6957, Perplexity: 14.8164
Epoch [1/3], Step [922/3236], Loss: 2.9020, Perplexity: 18.2114
Epoch [1/3], Step [923/3236], Loss: 2.6174, Perplexity: 13.7006
Epoch [1/3], Step [924/3236], Loss: 2.7032, Perplexity: 14.9269
Epoch [1/3], Step [925/3236], Loss: 2.8733, Perplexity: 17.6960
Epoch [1/3], Step [926/3236], Loss: 3.4810, Perplexity: 32.4919
Epoch [1/3], Step [927/3236], Loss: 2.7540, Perplexity: 15.7057
Epoch [1/3], Step [928/3236], Loss: 2.6390, Perplexity: 13.9992
Epoch [1/3], Step [929/3236], Loss: 2.8638, Perplexity: 17.5286
Epoch [1/3], Step [930/3236], Loss: 3.0232, Perplexity: 20.5577
Epoch [1/3], Step [931/3236], Loss: 2.7660, Perplexity: 15.8952
Epoch [1/3], Step [932/3236], Loss: 2.6902, Perplexity: 14.7347
Epoch [1/3], Step [933/3236], Loss: 2.6655, Perplexity: 14.3754
Epoch [1/3], Step [934/3236], Loss: 2.8514, Perplexity: 17.3118
Epoch [1/3], Step [935/3236], Loss: 2.7418, Perplexity: 15.5148
Epoch [1/3], Step [936/3236], Loss: 2.7380, Perplexity: 15.4559
Epoch [1/3], Step [937/3236], Loss: 2.7687, Perplexity: 15.9384
Epoch [1/3], Step [938/3236], Loss: 2.8523, Perplexity: 17.3268
Epoch [1/3], Step [939/3236], Loss: 2.7330, Perplexity: 15.3792
Epoch [1/3], Step [940/3236], Loss: 2.8536, Perplexity: 17.3496
Epoch [1/3], Step [941/3236], Loss: 2.6241, Perplexity: 13.7925
Epoch [1/3], Step [942/3236], Loss: 3.1209, Perplexity: 22.6661
Epoch [1/3], Step [943/3236], Loss: 2.6868, Perplexity: 14.6852
Epoch [1/3], Step [944/3236], Loss: 2.9353, Perplexity: 18.8264
Epoch [1/3], Step [945/3236], Loss: 2.6618, Perplexity: 14.3223
Epoch [1/3], Step [946/3236], Loss: 2.7601, Perplexity: 15.8013
Epoch [1/3], Step [947/3236], Loss: 2.9495, Perplexity: 19.0963
Epoch [1/3], Step [948/3236], Loss: 2.5520, Perplexity: 12.8332
Epoch [1/3], Step [949/3236], Loss: 2.7134, Perplexity: 15.0804
Epoch [1/3], Step [950/3236], Loss: 2.5248, Perplexity: 12.4886
Epoch [1/3], Step [951/3236], Loss: 3.5425, Perplexity: 34.5518
Epoch [1/3], Step [952/3236], Loss: 2.9086, Perplexity: 18.3306
Epoch [1/3], Step [953/3236], Loss: 2.6444, Perplexity: 14.0753
Epoch [1/3], Step [954/3236], Loss: 2.8382, Perplexity: 17.0842
Epoch [1/3], Step [955/3236], Loss: 2.6928, Perplexity: 14.7728
Epoch [1/3], Step [956/3236], Loss: 2.7721, Perplexity: 15.9928
Epoch [1/3], Step [957/3236], Loss: 2.6196, Perplexity: 13.7301
Epoch [1/3], Step [958/3236], Loss: 2.6689, Perplexity: 14.4235
Epoch [1/3], Step [959/3236], Loss: 2.5855, Perplexity: 13.2693
Epoch [1/3], Step [960/3236], Loss: 2.6714, Perplexity: 14.4602
Epoch [1/3], Step [961/3236], Loss: 2.7620, Perplexity: 15.8313
Epoch [1/3], Step [962/3236], Loss: 2.6917, Perplexity: 14.7567
Epoch [1/3], Step [963/3236], Loss: 3.0252, Perplexity: 20.5988
Epoch [1/3], Step [964/3236], Loss: 2.5187, Perplexity: 12.4120
Epoch [1/3], Step [965/3236], Loss: 2.6060, Perplexity: 13.5443
Epoch [1/3], Step [966/3236], Loss: 2.7349, Perplexity: 15.4083
Epoch [1/3], Step [967/3236], Loss: 2.5171, Perplexity: 12.3927
Epoch [1/3], Step [968/3236], Loss: 2.6594, Perplexity: 14.2884
Epoch [1/3], Step [969/3236], Loss: 2.7742, Perplexity: 16.0263
Epoch [1/3], Step [970/3236], Loss: 2.6515, Perplexity: 14.1753
Epoch [1/3], Step [971/3236], Loss: 3.3626, Perplexity: 28.8646
Epoch [1/3], Step [972/3236], Loss: 2.7313, Perplexity: 15.3526
Epoch [1/3], Step [973/3236], Loss: 2.4602, Perplexity: 11.7067
Epoch [1/3], Step [974/3236], Loss: 2.6174, Perplexity: 13.7001
Epoch [1/3], Step [975/3236], Loss: 2.7306, Perplexity: 15.3423
Epoch [1/3], Step [976/3236], Loss: 2.5080, Perplexity: 12.2798
Epoch [1/3], Step [977/3236], Loss: 2.7625, Perplexity: 15.8386
Epoch [1/3], Step [978/3236], Loss: 2.5860, Perplexity: 13.2771
Epoch [1/3], Step [979/3236], Loss: 3.1189, Perplexity: 22.6223
Epoch [1/3], Step [980/3236], Loss: 2.8770, Perplexity: 17.7617
Epoch [1/3], Step [981/3236], Loss: 2.7923, Perplexity: 16.3189
Epoch [1/3], Step [982/3236], Loss: 2.7234, Perplexity: 15.2324
Epoch [1/3], Step [983/3236], Loss: 2.6386, Perplexity: 13.9930
Epoch [1/3], Step [984/3236], Loss: 2.5416, Perplexity: 12.7003
Epoch [1/3], Step [985/3236], Loss: 2.6243, Perplexity: 13.7951
Epoch [1/3], Step [986/3236], Loss: 2.9789, Perplexity: 19.6667
Epoch [1/3], Step [987/3236], Loss: 2.6273, Perplexity: 13.8365
Epoch [1/3], Step [988/3236], Loss: 2.6751, Perplexity: 14.5145
Epoch [1/3], Step [989/3236], Loss: 2.7175, Perplexity: 15.1417
Epoch [1/3], Step [990/3236], Loss: 2.7133, Perplexity: 15.0793
Epoch [1/3], Step [991/3236], Loss: 2.6838, Perplexity: 14.6413
Epoch [1/3], Step [992/3236], Loss: 2.4967, Perplexity: 12.1418
Epoch [1/3], Step [993/3236], Loss: 2.6407, Perplexity: 14.0236
Epoch [1/3], Step [994/3236], Loss: 2.6578, Perplexity: 14.2642
Epoch [1/3], Step [995/3236], Loss: 2.5741, Perplexity: 13.1190
Epoch [1/3], Step [996/3236], Loss: 2.6019, Perplexity: 13.4887
Epoch [1/3], Step [997/3236], Loss: 2.5087, Perplexity: 12.2895
Epoch [1/3], Step [998/3236], Loss: 2.5850, Perplexity: 13.2628
Epoch [1/3], Step [999/3236], Loss: 2.5752, Perplexity: 13.1345
Epoch [1/3], Step [1000/3236], Loss: 3.0374, Perplexity: 20.8520