Adversarial Text
33 papers with code • 0 benchmarks • 2 datasets
Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.
Benchmarks
These leaderboards are used to track progress in Adversarial Text
Libraries
Use these libraries to find Adversarial Text models and implementationsMost implemented papers
Generative Adversarial Text to Image Synthesis
Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models.
T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack
In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.
Generating Natural Language Attacks in a Hard Label Black Box Setting
Our proposed attack strategy leverages population-based optimization algorithm to craft plausible and semantically similar adversarial examples by observing only the top label predicted by the target model.
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios.
BAE: BERT-based Adversarial Examples for Text Classification
Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model.
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness.
End-to-End Adversarial Text-to-Speech
Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest.
Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks.
Semantic-Preserving Adversarial Text Attacks
In this paper, we propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.