@article{jin2020manifold,
author = {Jin, Charles and Rinard, Martin},
journal = {arXiv preprint arXiv:2003.04286},
title = {{Manifold regularization for adversarial robustness}},
year = {2020}
}
This paper uses manifold regularization to achieve the goal of local stability, and improves the adversarial robustness.
The key insight is learning a function which does not vary much in small neighborhoods of natural inputs, even if the network classifies incorrectly. (this is a big difference from adversarial training)
Manifold regularization are based on the assumption that the input data is not drawn uniformly from the input domain
A function
$$|f|{I}^{2} \approx \frac{1}{N{b}^{2}} \sum_{i, j=1}^{N_{b}}\left(f\left(x_{i}\right)-f\left(x_{j}\right)\right)^{2} W_{i, j},$$ where the sum is over the samples in a mini-batch.
One benefit of this approach is that the regularization comes at nearly zero overhead when training via stochastic gradient descent.
One obvious way to extend the previous approach for
$$|H(\cdot ; \theta)|{I}^{2} \approx \frac{1}{N^{2}} \sum{i=1}^{N} H\left(x_{i}^{+}, x_{i}^{-} ; \theta\right)^{2} W_{i^{+}, i^{-}}$$
- higher adversarial and natural robustness.
- suggesting that encouraging the netowrk to be locally stable on the intrinsic geometry of the input submanifold leads to fundamentally different optima than using adversarial examples.
- No inner optimization loop to find strong perturbations at training time.
- Manifold regularization: A geometric framework for learning from labeled and unlabeled examples
- Taehoon Lee, Minsuk Choi, and Sungroh Yoon. Manifold regularized deep neural networks using adversarial examples, 2015.