Really light-weight GAN trainer. Uses Wasserstein loss with gradient penalty to train swappable architectures contained in models
that I hope to grow and experiment with. Still very much a WIP. For a quick review of interpreting Wasserstein loss see this post
and the WGAN paper: https://stats.stackexchange.com/questions/475696/how-to-interprete-discriminator-and-generator-loss-in-wgan
Adapted from https://github.com/ChenKaiXuSan/WGAN-GP-PyTorch