This document contains the most important differences implemented in every experiment.
- Add learning rate decay
- Add dropout
- Add small weight decay
- Fix 'left-right' task to take the unknown class
- Other enhancements
- Batchnorm to InstanceNorm (for conv layers) and LayerNorm (for dense layers)
- Back to ReLU activations
- Add weight decay (
lambda = 0.001
) - Add learning rate scheduler (on plateau with
patience = 2
andfactor = 0.3
)
- Add swish activations
- Add dropout (
p = 0.75
)