float8 training axiswise scaling support with per-gemm-argument confi… #653
Job | Run time |
---|---|
5s | |
2m 53s | |
7m 59s | |
7m 13s | |
7m 28s | |
2m 2s | |
7m 39s | |
7m 21s | |
7m 44s | |
2m 50s | |
7m 39s | |
7m 15s | |
7m 44s | |
2m 57s | |
7m 46s | |
7m 19s | |
7m 7s | |
34s | |
36s | |
42s | |
1m 15s | |
32s | |
44s | |
40s | |
41s | |
38s | |
35s | |
41s | |
35s | |
34s | |
34s | |
37s | |
36s | |
1h 51m 35s |