Adversarial Attack
597 papers with code • 2 benchmarks • 9 datasets
An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.
Source: Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks
Libraries
Use these libraries to find Adversarial Attack models and implementationsDatasets
Subtasks
Most implemented papers
Towards Deep Learning Models Resistant to Adversarial Attacks
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.
Towards Evaluating the Robustness of Neural Networks
Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library
An adversarial example library for constructing attacks, building defenses, and benchmarking both
Universal and Transferable Adversarial Attacks on Aligned Language Models
Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer).
The Limitations of Deep Learning in Adversarial Settings
In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.
Deep Variational Information Bottleneck
We present a variational approximation to the information bottleneck of Tishby et al. (1999).
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data.
Theoretically Principled Trade-off between Robustness and Accuracy
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.
Boosting Adversarial Attacks with Momentum
To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks.