A fundamental problem in machine learning is to fully represent
all possible states of a variable
---
name: others-GANs-tempoGAN
---
GANs were shown to work well for tasks such
as the inference of super-resolution solutions where the range of possible
results can be highly ambiguous.
To train a GAN we have to briefly turn to classification problems.
For these, the learning objective takes a slightly different form than the
regression objective in equation {eq}learn-l2
of {doc}overview-equations
:
We now want to maximize the likelihood of a learned representation
$$ \text{arg max}_{\theta} \Pi_i f(\mathbf{x}_i;\theta) , $$ (mle-prob)
the classic maximum likelihood estimation (MLE). In practice, it is typically turned into a sum of negative log likelihoods to give the learning objective
There are quite a few equivalent viewpoints for this fundamental expression:
e.g., it can be seen as minimizing the KL-divergence between the empirical distribution
as given by our training data set and the learned one. It likewise represents a maximization
of the expectation as defined by the training data, i.e.
mle-prob
.
MLE is a crucial component for GANs: here we have a generator that is typically
similar to a decoder network, e.g., the second half of an autoencoder from {doc}others-timeseries
.
For regular GANs, the generator receives a random input vector, denoted with
However, instead of directly training the generator, we employ a second network
that serves as loss for the generator. This second network is called discriminator,
and it has a classification task: to distinguish the generated samples from "real" ones.
The real ones are typically provided in the form of a training data set, samples of
which will be denoted as
For regular GANs training the classification task of the discriminator is typically formulated as
%
$$ \text{arg min}_{\theta_d} - \frac{1}{2}\mathbb E \text{ log} D(\mathbf{y}) - \frac{1}{2}\mathbb E \text{ log} (1 - D(G(\mathbf{z})) $$ % | f_d( f_e(\mathbf{s};\theta_e) ;\theta_d) - \mathbf{s} |_2^2
which, as outlined above, is a standard binary cross-entropy training for the class of real samples
The key for the generator loss is to employ the discriminator and produce samples that are classified as real by the discriminator:
Typically, this training is alternated, performing one step for
Due to the coupled, alternating training, GAN training has a reputation of being finicky in practice. Instead of a single, non-linear optimization problem, we now have two coupled ones, for which we need to find a fragile balance. (Otherwise we'll get the dreaded mode-collapse problem: once one of the two network "collapses" to a trivial solution, the coupled training breaks down.)
To alleviate this problem, regularization is often crucial to achieve a stable training. In the simplest case,
we can add an
For physical problems the regular GANs which generate solutions from the randomized latent-space
A good scenario for conditional GANs are super-resolution networks: These have the task to compute a high-resolution output given a sparse or low-resolution input solution.
One of the main advantages of GANs is that they can prevent an undesirable averaging for ambiguous data. E.g., consider the case of super-resolution: a low-resolution observation that serves as input typically has an infinite number of possible high-resolution solutions that would fit the low-res input.
If a data set contains multiple such cases, and we employ supervised training,
the network will reliably learn the mean. This averaged solution usually is one
that is clearly undesirable, and unlike any of the individual solutions from which it was
computed. This is the multi-modality problem, i.e. different modes existing as valid
equally valid solutions to a problem. For fluids, this can, e.g., happen when
we're facing bifurcations, as discussed in {doc}intro-teaser
.
The following image shows a clear example of how well GANs can circumvent this problem:
---
name: GANs-tempoGAN-fig3
---
A super-resolution example: a) input, b) supervised result, c) GAN result, d) high-resolution reference.
Naturally, the GAN approach is not limited to spatial resolutions. Previous work
has demonstrated that the concept of learned self-supervision extends to space-time
solutions, e.g., in the context of super-resolution for fluid simulations {cite}xie2018tempoGan
.
The following example compares the time derivatives of different solutions:
---
name: GANs-tempoGAN-fig4
---
From left to right, time derivatives for: a spatial GAN (i.e. not time aware), a temporally supervised learning, a spatio-temporal GAN, and a reference solution.
As can be seen, the GAN trained with spatio-temporal self-supervision (second from right) closely matches the reference solution on the far right. In this case the discriminator receives reference solutions over time (in the form of triplets), such that it can learn to judge whether the temporal evolution of a generated solution matches that of the reference.
As a last example, GANs were also shown to be able to
accurately capture solution manifolds of PDEs parametrized by physical parameters {cite}chu2021physgan
.
In this work, Navier-Stokes solutions parametrized by varying buoyancies, vorticity content, boundary conditions,
and obstacle geometries were learned by an NN.
This is a highly challenging solution manifold, and requires an extended "cyclic" GAN approach that pushes the discriminator to take all the physical parameters under consideration into account. Interestingly, the generator learns to produce realistic and accurate solutions despite being trained purely on data, i.e. without explicit help in the form of a differentiable physics solver setup.
---
name: others-GANs-meaningful-fig11
---
A range of example outputs of a physically-parametrized GAN {cite}`chu2021physgan`.
The network can successfully extrapolate to buoyancy settings beyond the
range of values seen at training time.
GANs are a powerful learning tool. Note that the discriminator
However, despite being very powerful tools, it is (given the current state-of-the-art) questionable
whether GANs make sense when we have access to a reasonable PDE model. If we can discretize the model
equations and include them with a differentiable physics (DP) training (cf. {doc}diffphys
),
this will most likely give
better results than trying to approximate the PDE model with a discriminator.
The DP training can yield similar benefits to GAN training: it yields a local gradient via the
discretized simulator, and in this way prevents undesirable averaging across samples.
Hence, combinations of DP training and GANs are also bound to not perform better than either of
them in isolation.
That being said, GANs can nonetheless be attractive in situations where DP training is infeasible due to black-box solvers without gradients
Due to the complexity of the training setup, we only refer to external open source
implementations for practical experiments with physical GANs. E.g.,
the spatio-temporal GAN from {cite}xie2018tempoGan
is available at
https://github.com/thunil/tempoGAN.
It also includes several extensions for stabilization, such as