-
07/2022: wrote two blog posts for adversarial patches and certifiably robust image classification post1 post2
-
03/2022: released the leaderboard for certified defenses for image classification against adversarial patches!!
-
11/2021: added explanations of different defense terminologies. added a few more recent papers.
-
08/2021: released the paper list!
Different from classic adversarial examples that are configured to have a small L_p norm distance to the normal examples, a localized adversarial patch attacker can arbitrarily modify the pixel values within a small region.
The attack algorithm is similar to those for the classic L_p adversarial example attack. You define a loss function and then optimize your perturbation to attain the attack objective. The only difference is that now 1) you can only optimize over pixels within a small region, 2) but within that region, the pixel values can be arbitrary as long as they are valid pixels.
Example of localized adversarial patch attack (image from Brown et al.):
It can be realized in the physical world!
Since all perturbations are within a small region, we can print and attach the patch in our physical world. This type of attack imposes a real-world threat on ML systems!
Note: not all existing physically-realizable attacks are in the category of patch attacks, but the localized patch attack is (one of) the simplest and the most popular physical attacks.
- Test-time attacks/defenses (do not consider localized backdoor triggers)
- 2D computer vision tasks (e.g., image classification, object detection, image segmentation)
- Localized attacks (do not consider other physical attacks that are more "global", e.g., some stop sign attacks which require changing the entire stop sign background)
- I first categorize the papers based on the task: image classification vs object detection (and semantic segmentation, and other tasks)
- I next group papers for attacks vs defenses.
- I tried to organize each group of papers chronically. I consider two timestamps: the time when the preprint is available (e.g., arXiv) and the time when a (published) paper was submitted for peer-review.
I am actively developing this paper list (I haven't added notes for all papers). If you want to contribute to the paper list, add your paper, correct any of my comments, or share any of your suggestions, feel free to reach out :)
There are two categories of defenses that have different robustness guarantees.
To evaluate the robustness of an empirical defense, we use concrete attack algorithms to attack the defense. The evaluated robustness does not have formal security guarantee: it might be compromised by smarter attackers in the future.
To evaluate a certified defense, we need to develop a robustness certification procedure to determine whether the defense has certifiable/provable robustness for a given input against a given threat model. We need to formally prove that the certification results will hold for any attack within the threat model, including ones that have full knowledge of the defense algorithm, setup, etc.
Notes:
-
We use certification procedure to evaluate the robustness of certified defenses; the certification procedure is agnostic to attack algorithms (i.e., it holds for any attack algorithm). As a bonus, this agnostic property lifts the burden of designing sophisticated adaptive attack algorithms for robustness evaluation. This is also why certified defenses usually do not provide "attack code" or "adversarial images" in their source code.
-
Strictly speaking, it is unfair to directly compare the performance of empirical defenses and certified defenses due to their different robustness notions.
-
Some empirical defenses might have (seemingly) high empirical robustness. However, those empirical defenses have zero certified robust accuracy, and their empirical robust accuracy might drop greatly given a smarter attacker.
-
On the other hand, the evaluated certified robustness is a provable lower bound on model performance against any empirical adaptive attack within the threat model.
-
There are also two different defense objectives (robustness notions).
Robust prediction (or prediction recovery) aims to always make correct decisions (e.g., correct classification label, correct bounding box detection), even in the presence of an attacker. It is also often called recovery-based defense.
Attack detection, on the other hand, only aims to detect an attack. If the defense detects an attack, it issues an alert and abstains from making predictions; if no attack is detected, it performs normal predictions. We can think of this type of defense as adding a special token "ALERT" to the model output space.
Notes:
- Apparently, the robust prediction defense is harder than the attack detection defense. The detect-and-alert setup might be problematic when human fallback is unavailable.
- Nevertheless, we are still interested in studying attack detection defense. There is an interesting paper discussing the connection between these two types of defense.
- How do we evaluate the robustness of attack detection defense:
- for clean images: perform the defense, and only consider a correct prediction when the output matches the ground-truth (alerts in the clean setting are errors!!)
- for adversarial images: perform the defense, count a robust prediction if the prediction output is ALERT or matches the ground-truth
CCS 2016
- uses "adversarial glasses" to fool face recognition models
arXiv 1708; ICCV Workshop 2017
- adversarial example attack against robot vision
- discusses the concept of localized perturbations, which could mimic attaching a sticker in the real world
arXiv 1712; NeurIPS workshop 2017
- The first paper that explicitly introduces the concept of adversarial patch attacks
- Demonstrate a universal physical world attack
arXiv 1801; ICML 2018
- Seems to be a concurrent work (?) as "Adversarial Patch"
- Digital domain attack
AAAI 2019
- generate imperceptible patches.
AAAI 2020
arXiv 2004; ECCV 2020
- a black-box attack via reinforcement learning
[Jacks of All Trades, Masters Of None: Addressing Distributional Shift and Obtrusiveness via Transparent Patch Attacks]
arXiv 2005
- transparent patch
arXiv 2011; Pattern Recognition
Sprinter 2021
- data independent attack; attack via increasing the magnitude of feature values
arXiv 2102; Neurocomputing
- use 3D modeling to enhance physical-world patch attack
arXiv 2104
- add stickers to face to fool face recognition system
arXiv 2106; CVPR 2021
- focus on transferability
arXiv 2106; an old version is available at arXiv 2009; IEEE Internet of Things Journal
- generate small (inconspicuous) and localized perturbations
arXiv 2108; ICCVW 2021
- consider physical-world patch attack in the 3-D space (images are taken from different angles)
Robust Adversarial Attack Against Explainable Deep Classification Models Based on Adversarial Images With Different Patch Sizes and Perturbation Ratios
IEEE Access
- use patch to attack classification models and explanation models
arXiv 2110; CVPR Workshop 2022
- An analysis of perturbing part of tokens of ViT
One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features
arXiv 2110
BMVC 2021
- learn patch content and location (instead of a fixed/random patch location)
arXiv 2111; ECML/PKDD 2022
- physical world attack via wearing a weird mask
arXiv 2111; TIFS
- natural-looking patch attacks
ICLR 2022
- not exactly a patch attack. use pixel patches to attack ViT
CVPR 2022
- not exactly an attack. use patch for good purposes...
IWSPA 2022
- empirically analyze the robustness of face authetication against physical-world attacks
Journal of Imaging
- combining QR code with adversarial patch
TPAMI
arXiv 2206
- a survey; no experimental evaluation
- Most (if not all) papers are covered in this paper.
arXiv 2209
CVPR 2022
- an interesting paper that proposes a loss targeted at attention modules
- studied both transformer-based image classification and object detection
ICML 2022 Workshop
- show that ViT is more robust to natural patch corruption, but more vulnerable to adversarial patch perturbations.
ECCV 2022
- optimize the patch shape as well
arXiv 2212
- attacking the feature space. applicable to image classification, object detection, semantic segmentation
- discuss the different behaviors of patch attacks and global perturbation attacks
TPAMI 2022
- surrogate model + reinforcement learning. the patch position is from RL model.
Adversarial Patch Attacks on Deep-Learning-Based Face Recognition Systems Using Generative Adversarial Networks
CVPR workshop 2023
USENIX Security 2023
(go back to table of contents)
Check out this leaderboard for certified robustness against adversarial patches!
The leaderboard provides a summary of all papers in this section! (todo: add ViP)
ICLR 2020
The first certified defense.
- Show that previous two empirical defenses (DW and LGS) are broken against an adaptive attacker
- Adapt IBP (Interval Bound Propagation) for certified defense
- Evaluate robustness against different shapes
- Very expensive; only works for CIFAR-10 and small models
IEEE S&P Workshop on Deep Learning Security 2020
- Certified defense; clip BagNet features
- Efficient
arXiv 2002, NeurIPS 2020
-
Certified defense; adapt ideas of randomized smoothing for
$L_0$ adversary - Majority voting on predictions made from cropped pixel patches
- Scale to ImageNet but expensive
arXiv 2004; ACNS workshop 2020
- Certified defense for detecting an attack
- Apply masks to the different locations of the input image and check inconsistency in masked predictions
- Too expensive to scale to ImageNet (?)
PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking
arXiv 2005; USENIX Security 2021
- Certified defense framework with two general principles: small receptive field to bound the number of corrupted features and secure aggregation for final robust prediction
- BagNet for small receptive fields; robust masking for secure aggregation, which detects and masks malicious feature values
- Subsumes several existing and follow-up papers
Available on ICLR open review in 10/2020; ICLR 2021
- Certified defense
- BagNet to bound the number of corrupted features; Heaviside step function & majority voting for secure aggregation
- Efficient, evaluate on different patch shapes
Available on ICLR open review in 10/2020
- Certified defense
- Randomized image cropping + majority voting
- only probabilistic certified robustness
arXiv 2104; ICLR workshop 2021
- Certified defense for detecting an attack
- A hybrid of PatchGuard and Minority Reports
NeurIPS 2021
- certified defense for attack detection. a fun paper using ideas from both minority reports and PatchGuard++
- The basic idea is to apply (pixel) masks and check prediction consistency
- it further uses superficial important neurons (the neurons that contribute significantly to the shallow feature map values) to prune unimportant regions so that the number of masks is reduced.
arXiv 2108; USENIX Security 2022
- Certified defense that is compatible with any state-of-the-art image classifier
- huge improvements in clean accuracy and certified robust accuracy (its clean accuracy is close to SOTA image classifier)
arXiv 2110; CVPR 2022
- Certified defense. ViT + De-randomized Smoothing
- Drop tokens that correspond to pixel masks to greatly improve efficiency.
CVPR 2022
- Certified defense. ViT + De-randomized Smoothing
- A progressive training technique for smoothed ViT
- Isolated (instead of global) self-attention to improve defense efficiency
arXiv 2111
- certified defense for attack detection.
- The idea is basically Minority Reports with Vision Transformer
- The evaluation of clean accuracy seems problematic... (feel free to correct me if I am wrong)
- For a clean image, the authors consider the model prediction to be correct even when the defense issues an attack alert
ECCV 2022
- certified defense for both robust prediction and attack detection.
- The robust-prediction (recovery-based) defense is similar to smoothed ViT (but with more different "masking" strategies); the detection-based defense is similar to Minority Reports.
- They proposed a generalized window mask to handle the two-patch problem.
- The clean performance evaluation of the detection-based defense does not count false alerts as errors (the authors say they are working on a revision to fix this issue; I will add this defense to the leadboard soon)
Revisiting Image Classifier Training for Improved Certified Robust Defense against Adversarial Patches
TMLR
- better training technique for PatchCleanser
http://scis.scichina.com/en/2022/170306.pdf https://ieeexplore.ieee.org/abstract/document/9897387
Check out this leaderboard for certified robustness against adversarial patches!
(go back to table of contents)
CVPR workshop 2018
- The first empirical defense. Use saliency map to detect and mask adversarial patches.
arXiv 1807; WACV 2019
- An empirical defense. Use pixel gradient to detect patch and smooth in the suspected regions.
Journal of Big Data
- An empirical defense, make prediction on pixel patches, and do majority voting
- This paper is somehow missed by almost all relevant papers in this field (probably due to its venue); it only has one self-citation. However, its idea is quite similar to some certified defenses that are published in 2019-2020
arXiv 1907 BMVC 2019
- briefly discuss localized patch attacks
arXiv 1909, ICLR 2020
- Empirical defense via adversarial training
- Interestingly show that adversarial training for patch attack does not hurt model clean accuracy
- Only works on small images
arXiv 1812; IEEE S&P Workshop on Deep Learning Security 2020
- Empirical defense that leverages the universality of the attack (inapplicable to non-universal attacks)
arXiv 2002
- empirical defense
arXiv 2005, ECCV workshop 2020
- empirical defense via adversarial training (in which the patch location is being optimized)
- move the patch toward the direction where loss increases
arXiv 2009; ACCV 2020
- use conditional GAN to generate patch for adversarial training
Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks
arXiv 2012
- empirical defense; directly use CompNet to defend against black-box patch attack (evaluated with PatchAttack)
CISS 2021
- An empirical defense against black-box patch attacks
- A direct application of CompNet
arXiv 2102; INFOCOM 2021
- empirical defense for attack detection
A Novel Lightweight Defense Method Against Adversarial Patches-Based Attacks on Automated Vehicle Make and Model Recognition Systems
Journal of Network and Systems Management
- empirical defense. require the assumption of horizontal symmetry of the image. only applicable to a certain scenario.
arXiv 2105
- An empirical defense that uses the magnitude and variance of the feature map values to detect an attack
- focus more on the universal attack (both localized patch and global perturbations)
ICML UDL workshop
- empirical defense. detect and remove outliers in ViT
arXiv 2108; AsiaCCS 2023
- empirical defense; use universality
- similar to SentiNet without some improvements: detect critical region and overlay it to a different image
ICCV 2021
- empirical defense via clipping feature norm.
- Oddly, this paper does not cite Clipped BagNet
ICCV workshop 2021
MM workshop 2021
- make predictions on small image region. train a classifier on predictions on different regions.
- discuss "provable robustness"
arXiv 2203; ICML workshop 2022
- A dataset for adversarial patches
- Clarification from the authors: the main purpose of the ImageNet-Patch dataset is to provide a fast benchmark of models against patch attacks, but not strictly related to defenses against adversarial patches, which is why they did not cite any adversarial patch defense papers.
- (I ran some experiments; the transferability to architectures like ViT and ResMLP seemed low)
ICML 2023
(go back to table of contents)
arXiv 1806; AAAI workshop 2019
- The first (?) patch attack against object detector
arXiv 1904; CVPR workshop 2019
- using a rigid board printed with adversarial perturbations to evade the detection of a person
arXiv 1906
- interestingly show that a physical-world patch in the background (far away from the victim objects) can have malicious effect
arXiv 1909; CVPR 2020
CCS 2019
arXiv 1910; ECCV 2020
- use a non-rigid T-shirt to evade person detection
arXiv 1910; ECCV 2020
- wear an ugly T-shirt to evade person detection
arXiv 1912; ECCV 2020
- a dataset with annotated patch locations.
IEEE IoT-J 2020
arXiv 2008
arXiv 2010; IJCNN 2020
arXiv 2010; CIKM workshop
- not really a conventional patch attacks. more like a structured L0 attacks
- briefly discuss that two-stage detector like Faster-RCNN is harder to attack compared with one-stage detector like YOLO
arXiv 2010
- add multiple small patches
- use heatmap to find sensitive location to place the patches
arXiv 2010
- dynamic -- the patch location/position changes as the camera moves around
CoRL
- attacking 3D and 2D perception at the same time
arXiv 2012; CVPR 2021
- not really a patch attack. put a translucent patch in front of the camera
arXiv 2103; ICME 2021
- achieve high attack performance using a small number of pixels (e.g, 0.32% pixels)
- the adversarial pixels are scatter across the entire image and different objects, more like a L0 attack
arXiv 2108
arXiv 2109
- attack against object detectors that can also evade attack-detection models.
ICCV 2021
- an improved attack from adversarial T-shirt. The patch looks more natural (e.g., a dog)
MM 2021
- an improved attack from adversarial T-shirt. The patch looks more natural (e.g., an Ivysaur!)
Developing Imperceptible Adversarial Patches to Camouflage Military Assets From Computer Vision Enabled Technologies
CVPR 2022
- consider cameras from different angles
RA-L 2022
- use adversarial patches to attack object detectors, in a robotic arm setting
arXiv 2207; IJCAI Workshop 2022
arXiv 2208
IJCAI Workshop 2022
ICECCME 2022
Poster: On the System-Level Effectiveness of Physical Object-Hiding Adversarial Attack in Autonomous Driving
CCS 2022 Poster
- demonstrate that existing physical patch attack might not be that effective when consider other components of autonomous car (e.g., object tracking)
arXiv 2210
USENIX Security 2023
- physical adversarial patch triggered by acoustic signals
arXiv 2211
AAAI 2023
arXiv 2303
TransPatch: A Transformer-based Generator for Accelerating Transferable Patch Generation in Adversarial Attacks Against Object Detection Models
ECCV 2023
CVPR 2023
CVPR 2023
CVPR 2023 (not really an attack)
(go back to table of contents)
arXiv 2102; CCS 2021
- The first certified defense for patch hiding attack
- Adapt robust image classifiers for robust object detection
- Provable robustness at a negligible cost of clean performance
ObjectSeeker: Certifiably Robust Object Detection against Patch Hiding Attacks via Patch-agnostic Masking
arXiv 2202
- use pixel masks to remove the adversarial patch in a certifiably robust manner.
- a significant improvement in certified robustness
- also discuss different robustness notions
(go back to table of contents)
arXiv 1910; CVPR workshop 2020
- The first empirical defense, adding a regularization loss to constrain the use of spatial information
- only experiment on YOLOv2 and small datasets like PASCAL VOC
arXiv 2101; ICML 2021 workshop
arXiv 2103
- Empirical defense via adding adversarial patches and a "patch" class during the training
arXiv 2106
- Two empirical defenses for patch hiding attack
- Feed small image region to the detector; grows the region with some heuristics; detect an attack when YOLO detects objects in a smaller region but misses objects in a larger expanded region.
MM 2021
- Empirical defense. Adversarially train a "MaskNet" to detect and mask the patch
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection
CVPR 2022
arXiv 2203
- use z-score in each layer to detect patches.
TIP
- use a GAN-like training strategy to train a "white frame". The frame is attached to the image to invalidate patch attacks.
arxiv 2207
- detect adversarial patches and remove them based on their different textures. consider different tasks like image classification, object detection, and video analysis
arXiv 2208
MM 2022
- train a network to detect and mask the patch
CVPR 2023
Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches
canary patch
(go back to table of contents)
ECCV 2020
- indirect -- the malicious pixels do not overlap with the victim pixels/objects
- interestingly shows that more advanced models are more vulnerable to "indirect" patch attacks, since they leverage more "context" information.
arXiv 2105
- remote -- the patch is away from the victim pixels
Evaluating the Robustness of Semantic Segmentation for Autonomous Driving against Real-World Adversarial Patch Attacks
arXiv 2108; WACV 2022
On the Feasibility and Generality of Patch-based Adversarial Attacks on Semantic Segmentation Problems
arXiv 2205; ICPRAI 2022
arXiv 2212
- attacking the feature space. applicable to image classification, object detection, semantic segmentation
- discuss the different behaviors of patch attacks and global perturbation attacks
arXiv 2209
- The first certified defense for image segmentation
- Use different masking strategies for attack detection or robust prediction
- Perform inpainting after masking to improve performance
- Unfortunately, this paper does not count false alerts of detection-based defense as errors in the clean setting
Defense against Adversarial Patch Attacks for Aerial Image Semantic Segmentation by Robust Feature Extraction
arXiv 2010
- attack depth estimation
- patch attacks against the task of image retrieval
MM 2021
Dirty Road Can Attack: Security of Deep Learning based Automated Lane Centering under Physical-World Attack
USENIX 2021
- use a patch on the road to attack lane detection.
CCS 2022
- Not actuall a patch attack; the paper is about audio
arXiv 2207; ECCV 2022 workshop
- use adversarial patches to attack ML-based Visual Odometry
arXiv 2301
- add a patch(set) of nodes to graph to attack GNN
ADV-POST: Physically Realistic Adversarial Poster for Attacking Semantic Segmentation Models in Autonomous Driving
LanCeX: A Versatile and Lightweight Defense Method against Condensed Adversarial Attacks in Image and Audio Recognition
- consider defenses for both image and audio
ICIP 2022
- apply certifiably robust defenses for test-time attack to backdoored models
ICLR 2023