Skip to content

Latest commit

 

History

History
1141 lines (581 loc) · 48.3 KB

README.md

File metadata and controls

1141 lines (581 loc) · 48.3 KB

A Paper List for Localized Adversarial Patch Research

Changelog

  • 07/2022: wrote two blog posts for adversarial patches and certifiably robust image classification post1 post2

  • 03/2022: released the leaderboard for certified defenses for image classification against adversarial patches!!

  • 11/2021: added explanations of different defense terminologies. added a few more recent papers.

  • 08/2021: released the paper list!

What is the localized adversarial patch attack?

Different from classic adversarial examples that are configured to have a small L_p norm distance to the normal examples, a localized adversarial patch attacker can arbitrarily modify the pixel values within a small region.

The attack algorithm is similar to those for the classic L_p adversarial example attack. You define a loss function and then optimize your perturbation to attain the attack objective. The only difference is that now 1) you can only optimize over pixels within a small region, 2) but within that region, the pixel values can be arbitrary as long as they are valid pixels.

Example of localized adversarial patch attack (image from Brown et al.):

patch image example

What makes this attack interesting?

It can be realized in the physical world!

Since all perturbations are within a small region, we can print and attach the patch in our physical world. This type of attack imposes a real-world threat on ML systems!

Note: not all existing physically-realizable attacks are in the category of patch attacks, but the localized patch attack is (one of) the simplest and the most popular physical attacks.

About this paper list

Focus

  1. Test-time attacks/defenses (do not consider localized backdoor triggers)
  2. 2D computer vision tasks (e.g., image classification, object detection, image segmentation)
  3. Localized attacks (do not consider other physical attacks that are more "global", e.g., some stop sign attacks which require changing the entire stop sign background)

Organization

  1. I first categorize the papers based on the task: image classification vs object detection (and semantic segmentation, and other tasks)
  2. I next group papers for attacks vs defenses.
  3. I tried to organize each group of papers chronically. I consider two timestamps: the time when the preprint is available (e.g., arXiv) and the time when a (published) paper was submitted for peer-review.

I am actively developing this paper list (I haven't added notes for all papers). If you want to contribute to the paper list, add your paper, correct any of my comments, or share any of your suggestions, feel free to reach out :)

Table of Contents

Defense Terminology

Empirically Robust Defenses vs Provably/certifiably Robust Defenses

There are two categories of defenses that have different robustness guarantees.

To evaluate the robustness of an empirical defense, we use concrete attack algorithms to attack the defense. The evaluated robustness does not have formal security guarantee: it might be compromised by smarter attackers in the future.

To evaluate a certified defense, we need to develop a robustness certification procedure to determine whether the defense has certifiable/provable robustness for a given input against a given threat model. We need to formally prove that the certification results will hold for any attack within the threat model, including ones that have full knowledge of the defense algorithm, setup, etc.

Notes:

  1. We use certification procedure to evaluate the robustness of certified defenses; the certification procedure is agnostic to attack algorithms (i.e., it holds for any attack algorithm). As a bonus, this agnostic property lifts the burden of designing sophisticated adaptive attack algorithms for robustness evaluation. This is also why certified defenses usually do not provide "attack code" or "adversarial images" in their source code.

  2. Strictly speaking, it is unfair to directly compare the performance of empirical defenses and certified defenses due to their different robustness notions.

    1. Some empirical defenses might have (seemingly) high empirical robustness. However, those empirical defenses have zero certified robust accuracy, and their empirical robust accuracy might drop greatly given a smarter attacker.

    2. On the other hand, the evaluated certified robustness is a provable lower bound on model performance against any empirical adaptive attack within the threat model.

Robust Prediction vs Attack Detection

There are also two different defense objectives (robustness notions).

Robust prediction (or prediction recovery) aims to always make correct decisions (e.g., correct classification label, correct bounding box detection), even in the presence of an attacker. It is also often called recovery-based defense.

Attack detection, on the other hand, only aims to detect an attack. If the defense detects an attack, it issues an alert and abstains from making predictions; if no attack is detected, it performs normal predictions. We can think of this type of defense as adding a special token "ALERT" to the model output space.

Notes:

  1. Apparently, the robust prediction defense is harder than the attack detection defense. The detect-and-alert setup might be problematic when human fallback is unavailable.
  2. Nevertheless, we are still interested in studying attack detection defense. There is an interesting paper discussing the connection between these two types of defense.
  3. How do we evaluate the robustness of attack detection defense:
    1. for clean images: perform the defense, and only consider a correct prediction when the output matches the ground-truth (alerts in the clean setting are errors!!)
    2. for adversarial images: perform the defense, count a robust prediction if the prediction output is ALERT or matches the ground-truth

Image Classification

Attacks

CCS 2016

  1. uses "adversarial glasses" to fool face recognition models

arXiv 1708; ICCV Workshop 2017

  1. adversarial example attack against robot vision
  2. discusses the concept of localized perturbations, which could mimic attaching a sticker in the real world

arXiv 1712; NeurIPS workshop 2017

  1. The first paper that explicitly introduces the concept of adversarial patch attacks
  2. Demonstrate a universal physical world attack

arXiv 1801; ICML 2018

  1. Seems to be a concurrent work (?) as "Adversarial Patch"
  2. Digital domain attack

AAAI 2019

  1. generate imperceptible patches.

AAAI 2020

arXiv 2004; ECCV 2020

  1. a black-box attack via reinforcement learning

[Jacks of All Trades, Masters Of None: Addressing Distributional Shift and Obtrusiveness via Transparent Patch Attacks]

arXiv 2005

  1. transparent patch

arXiv 2011; Pattern Recognition

Sprinter 2021

  1. data independent attack; attack via increasing the magnitude of feature values

arXiv 2102; Neurocomputing

  1. use 3D modeling to enhance physical-world patch attack

arXiv 2104

  1. add stickers to face to fool face recognition system

arXiv 2106; CVPR 2021

  1. focus on transferability

arXiv 2106; an old version is available at arXiv 2009; IEEE Internet of Things Journal

  1. generate small (inconspicuous) and localized perturbations

arXiv 2108; ICCVW 2021

  1. consider physical-world patch attack in the 3-D space (images are taken from different angles)

IEEE Access

  1. use patch to attack classification models and explanation models

arXiv 2110; CVPR Workshop 2022

  1. An analysis of perturbing part of tokens of ViT

arXiv 2110

BMVC 2021

  1. learn patch content and location (instead of a fixed/random patch location)

arXiv 2111; ECML/PKDD 2022

  1. physical world attack via wearing a weird mask

arXiv 2111; TIFS

  1. natural-looking patch attacks

ICLR 2022

  1. not exactly a patch attack. use pixel patches to attack ViT

CVPR 2022

  1. not exactly an attack. use patch for good purposes...

IWSPA 2022

  1. empirically analyze the robustness of face authetication against physical-world attacks

Journal of Imaging

  1. combining QR code with adversarial patch

TPAMI

arXiv 2206

  1. a survey; no experimental evaluation
  2. Most (if not all) papers are covered in this paper.

arXiv 2209

CVPR 2022

  1. an interesting paper that proposes a loss targeted at attention modules
  2. studied both transformer-based image classification and object detection

ICML 2022 Workshop

  1. show that ViT is more robust to natural patch corruption, but more vulnerable to adversarial patch perturbations.

ECCV 2022

  1. optimize the patch shape as well

arXiv 2212

  1. attacking the feature space. applicable to image classification, object detection, semantic segmentation
  2. discuss the different behaviors of patch attacks and global perturbation attacks

TPAMI 2022

  1. surrogate model + reinforcement learning. the patch position is from RL model.

CVPR workshop 2023

USENIX Security 2023

(go back to table of contents)

Certified Defenses

Check out this leaderboard for certified robustness against adversarial patches!

The leaderboard provides a summary of all papers in this section! (todo: add ViP)

ICLR 2020

The first certified defense.

  1. Show that previous two empirical defenses (DW and LGS) are broken against an adaptive attacker
  2. Adapt IBP (Interval Bound Propagation) for certified defense
  3. Evaluate robustness against different shapes
  4. Very expensive; only works for CIFAR-10 and small models

IEEE S&P Workshop on Deep Learning Security 2020

  1. Certified defense; clip BagNet features
  2. Efficient

arXiv 2002, NeurIPS 2020

  1. Certified defense; adapt ideas of randomized smoothing for $L_0$ adversary
  2. Majority voting on predictions made from cropped pixel patches
  3. Scale to ImageNet but expensive

arXiv 2004; ACNS workshop 2020

  1. Certified defense for detecting an attack
  2. Apply masks to the different locations of the input image and check inconsistency in masked predictions
  3. Too expensive to scale to ImageNet (?)

arXiv 2005; USENIX Security 2021

  1. Certified defense framework with two general principles: small receptive field to bound the number of corrupted features and secure aggregation for final robust prediction
  2. BagNet for small receptive fields; robust masking for secure aggregation, which detects and masks malicious feature values
  3. Subsumes several existing and follow-up papers

Available on ICLR open review in 10/2020; ICLR 2021

  1. Certified defense
  2. BagNet to bound the number of corrupted features; Heaviside step function & majority voting for secure aggregation
  3. Efficient, evaluate on different patch shapes

Available on ICLR open review in 10/2020

  1. Certified defense
  2. Randomized image cropping + majority voting
  3. only probabilistic certified robustness

arXiv 2104; ICLR workshop 2021

  1. Certified defense for detecting an attack
  2. A hybrid of PatchGuard and Minority Reports

NeurIPS 2021

  1. certified defense for attack detection. a fun paper using ideas from both minority reports and PatchGuard++
  2. The basic idea is to apply (pixel) masks and check prediction consistency
  3. it further uses superficial important neurons (the neurons that contribute significantly to the shallow feature map values) to prune unimportant regions so that the number of masks is reduced.

arXiv 2108; USENIX Security 2022

  1. Certified defense that is compatible with any state-of-the-art image classifier
  2. huge improvements in clean accuracy and certified robust accuracy (its clean accuracy is close to SOTA image classifier)

arXiv 2110; CVPR 2022

  1. Certified defense. ViT + De-randomized Smoothing
  2. Drop tokens that correspond to pixel masks to greatly improve efficiency.

CVPR 2022

  1. Certified defense. ViT + De-randomized Smoothing
  2. A progressive training technique for smoothed ViT
  3. Isolated (instead of global) self-attention to improve defense efficiency

arXiv 2111

  1. certified defense for attack detection.
  2. The idea is basically Minority Reports with Vision Transformer
  3. The evaluation of clean accuracy seems problematic... (feel free to correct me if I am wrong)
    1. For a clean image, the authors consider the model prediction to be correct even when the defense issues an attack alert

ECCV 2022

  1. certified defense for both robust prediction and attack detection.
  2. The robust-prediction (recovery-based) defense is similar to smoothed ViT (but with more different "masking" strategies); the detection-based defense is similar to Minority Reports.
  3. They proposed a generalized window mask to handle the two-patch problem.
  4. The clean performance evaluation of the detection-based defense does not count false alerts as errors (the authors say they are working on a revision to fix this issue; I will add this defense to the leadboard soon)

TMLR

  1. better training technique for PatchCleanser

http://scis.scichina.com/en/2022/170306.pdf https://ieeexplore.ieee.org/abstract/document/9897387

Check out this leaderboard for certified robustness against adversarial patches!

(go back to table of contents)

Empirical Defenses

CVPR workshop 2018

  1. The first empirical defense. Use saliency map to detect and mask adversarial patches.

arXiv 1807; WACV 2019

  1. An empirical defense. Use pixel gradient to detect patch and smooth in the suspected regions.

Journal of Big Data

  1. An empirical defense, make prediction on pixel patches, and do majority voting
  2. This paper is somehow missed by almost all relevant papers in this field (probably due to its venue); it only has one self-citation. However, its idea is quite similar to some certified defenses that are published in 2019-2020

arXiv 1907 BMVC 2019

  1. briefly discuss localized patch attacks

arXiv 1909, ICLR 2020

  1. Empirical defense via adversarial training
  2. Interestingly show that adversarial training for patch attack does not hurt model clean accuracy
  3. Only works on small images

arXiv 1812; IEEE S&P Workshop on Deep Learning Security 2020

  1. Empirical defense that leverages the universality of the attack (inapplicable to non-universal attacks)

arXiv 2002

  1. empirical defense

arXiv 2005, ECCV workshop 2020

  1. empirical defense via adversarial training (in which the patch location is being optimized)
  2. move the patch toward the direction where loss increases

arXiv 2009; ACCV 2020

  1. use conditional GAN to generate patch for adversarial training

arXiv 2012

  1. empirical defense; directly use CompNet to defend against black-box patch attack (evaluated with PatchAttack)

CISS 2021

  1. An empirical defense against black-box patch attacks
  2. A direct application of CompNet

arXiv 2102; INFOCOM 2021

  1. empirical defense for attack detection

Journal of Network and Systems Management

  1. empirical defense. require the assumption of horizontal symmetry of the image. only applicable to a certain scenario.

arXiv 2105

  1. An empirical defense that uses the magnitude and variance of the feature map values to detect an attack
  2. focus more on the universal attack (both localized patch and global perturbations)

ICML UDL workshop

  1. empirical defense. detect and remove outliers in ViT

old title:Turning Your Strength against You: Detecting and Mitigating Robust and Universal Adversarial Patch Attack

arXiv 2108; AsiaCCS 2023

  1. empirical defense; use universality
  2. similar to SentiNet without some improvements: detect critical region and overlay it to a different image

ICCV 2021

  1. empirical defense via clipping feature norm.
  2. Oddly, this paper does not cite Clipped BagNet

ICCV workshop 2021

MM workshop 2021

  1. make predictions on small image region. train a classifier on predictions on different regions.
  2. discuss "provable robustness"

arXiv 2203; ICML workshop 2022

  1. A dataset for adversarial patches
  2. Clarification from the authors: the main purpose of the ImageNet-Patch dataset is to provide a fast benchmark of models against patch attacks, but not strictly related to defenses against adversarial patches, which is why they did not cite any adversarial patch defense papers.
  3. (I ran some experiments; the transferability to architectures like ViT and ResMLP seemed low)

ICML 2023

(go back to table of contents)

Object Detection

Attacks

arXiv 1806; AAAI workshop 2019

  1. The first (?) patch attack against object detector

arXiv 1904; CVPR workshop 2019

  1. using a rigid board printed with adversarial perturbations to evade the detection of a person

arXiv 1906

  1. interestingly show that a physical-world patch in the background (far away from the victim objects) can have malicious effect

arXiv 1909; CVPR 2020

CCS 2019

arXiv 1910; ECCV 2020

  1. use a non-rigid T-shirt to evade person detection

arXiv 1910; ECCV 2020

  1. wear an ugly T-shirt to evade person detection

arXiv 1912; ECCV 2020

  1. a dataset with annotated patch locations.

IEEE IoT-J 2020

arXiv 2008

arXiv 2010; IJCNN 2020

arXiv 2010; CIKM workshop

  1. not really a conventional patch attacks. more like a structured L0 attacks
  2. briefly discuss that two-stage detector like Faster-RCNN is harder to attack compared with one-stage detector like YOLO

arXiv 2010

  1. add multiple small patches
  2. use heatmap to find sensitive location to place the patches

arXiv 2010

  1. dynamic -- the patch location/position changes as the camera moves around

CoRL

  1. attacking 3D and 2D perception at the same time

arXiv 2012; CVPR 2021

  1. not really a patch attack. put a translucent patch in front of the camera

arXiv 2103; ICME 2021

  1. achieve high attack performance using a small number of pixels (e.g, 0.32% pixels)
  2. the adversarial pixels are scatter across the entire image and different objects, more like a L0 attack

arXiv 2108

arXiv 2109

  1. attack against object detectors that can also evade attack-detection models.

ICCV 2021

  1. an improved attack from adversarial T-shirt. The patch looks more natural (e.g., a dog)

MM 2021

  1. an improved attack from adversarial T-shirt. The patch looks more natural (e.g., an Ivysaur!)

CVPR 2022

  1. consider cameras from different angles

RA-L 2022

  1. use adversarial patches to attack object detectors, in a robotic arm setting

arXiv 2207; IJCAI Workshop 2022

arXiv 2208

IJCAI Workshop 2022

ICECCME 2022

CCS 2022 Poster

  1. demonstrate that existing physical patch attack might not be that effective when consider other components of autonomous car (e.g., object tracking)

arXiv 2210

USENIX Security 2023

  1. physical adversarial patch triggered by acoustic signals

arXiv 2211

AAAI 2023

arXiv 2303

ECCV 2023

CVPR 2023

CVPR 2023

CVPR 2023 (not really an attack)

(go back to table of contents)

Certified Defenses

arXiv 2102; CCS 2021

  1. The first certified defense for patch hiding attack
  2. Adapt robust image classifiers for robust object detection
  3. Provable robustness at a negligible cost of clean performance

arXiv 2202

  1. use pixel masks to remove the adversarial patch in a certifiably robust manner.
  2. a significant improvement in certified robustness
  3. also discuss different robustness notions

(go back to table of contents)

Empirical Defenses

arXiv 1910; CVPR workshop 2020

  1. The first empirical defense, adding a regularization loss to constrain the use of spatial information
  2. only experiment on YOLOv2 and small datasets like PASCAL VOC

arXiv 2101; ICML 2021 workshop

arXiv 2103

  1. Empirical defense via adding adversarial patches and a "patch" class during the training

arXiv 2106

  1. Two empirical defenses for patch hiding attack
  2. Feed small image region to the detector; grows the region with some heuristics; detect an attack when YOLO detects objects in a smaller region but misses objects in a larger expanded region.

MM 2021

  1. Empirical defense. Adversarially train a "MaskNet" to detect and mask the patch

CVPR 2022

arXiv 2203

  1. use z-score in each layer to detect patches.

TIP

  1. use a GAN-like training strategy to train a "white frame". The frame is attached to the image to invalidate patch attacks.

arxiv 2207

  1. detect adversarial patches and remove them based on their different textures. consider different tasks like image classification, object detection, and video analysis

arXiv 2208

MM 2022

  1. train a network to detect and mask the patch

CVPR 2023

canary patch

(go back to table of contents)

Semantic Segmentation

Attacks

ECCV 2020

  1. indirect -- the malicious pixels do not overlap with the victim pixels/objects
  2. interestingly shows that more advanced models are more vulnerable to "indirect" patch attacks, since they leverage more "context" information.

arXiv 2105

  1. remote -- the patch is away from the victim pixels

arXiv 2108; WACV 2022

arXiv 2205; ICPRAI 2022

arXiv 2212

  1. attacking the feature space. applicable to image classification, object detection, semantic segmentation
  2. discuss the different behaviors of patch attacks and global perturbation attacks

Defenses

arXiv 2209

  1. The first certified defense for image segmentation
  2. Use different masking strategies for attack detection or robust prediction
  3. Perform inpainting after masking to improve performance
  4. Unfortunately, this paper does not count false alerts of detection-based defense as errors in the clean setting

Other tasks

Attacks

arXiv 2010

  1. attack depth estimation
  1. patch attacks against the task of image retrieval

MM 2021

USENIX 2021

  1. use a patch on the road to attack lane detection.

CCS 2022

  1. Not actuall a patch attack; the paper is about audio

arXiv 2207; ECCV 2022 workshop

  1. use adversarial patches to attack ML-based Visual Odometry

arXiv 2301

  1. add a patch(set) of nodes to graph to attack GNN

Defenses

  1. consider defenses for both image and audio

ICIP 2022

  1. apply certifiably robust defenses for test-time attack to backdoored models

ICLR 2023

(go back to table of contents)