Search Results for author: Luke Zettlemoyer

Found 228 papers, 137 papers with code

Prompt-free and Efficient Few-shot Learning with Language Models

1 code implementation • ACL 2022 • Rabeeh Karimi Mahabadi, Luke Zettlemoyer, James Henderson, Lambert Mathias, Marzieh Saeidi, Veselin Stoyanov, Majid Yazdani

Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score.

Few-Shot Learning

106

Paper
Code

Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right

1 code implementation • EMNLP 2021 • Ari Holtzman, Peter West, Vered Shwartz, Yejin Choi, Luke Zettlemoyer

Large language models have shown promising results in zero-shot settings.

Multiple-choice valid

Paper
Code

Inducing Semantic Roles Without Syntax

1 code implementation • Findings (ACL) 2021 • Julian Michael, Luke Zettlemoyer

Paper
Code

MoDE: CLIP Data Experts via Clustering

1 code implementation • 24 Apr 2024 • Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-tau Yih, Hu Xu

The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data.

996

Paper
Code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation • 12 Apr 2024 • Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

325

Paper
Code

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling

no code implementations • 15 Mar 2024 • Tomasz Limisiewicz, Terra Blevins, Hila Gonen, Orevaoghene Ahia, Luke Zettlemoyer

A major consideration in multilingual language modeling is how to best represent languages with diverse vocabularies and scripts.

Language Modelling

Paper
Add Code

Reliable, Adaptable, and Attributable Language Models with Retrieval

no code implementations • 5 Mar 2024 • Akari Asai, Zexuan Zhong, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi, Wen-tau Yih

Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability.

Question Answering Retrieval

Paper
Add Code

Comparing Hallucination Detection Metrics for Multilingual Generation

no code implementations • 16 Feb 2024 • Haoqiang Kang, Terra Blevins, Luke Zettlemoyer

While many automatic hallucination detection techniques have been proposed for English texts, their effectiveness in multilingual contexts remains unexplored.

Hallucination Natural Language Inference +1

Paper
Add Code

Do Membership Inference Attacks Work on Large Language Models?

1 code implementation • 12 Feb 2024 • Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.

Membership Inference Attack

Paper
Code

OLMo: Accelerating the Science of Language Models

2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi

Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.

Language Modelling

3,962

Paper
Code

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo

Language models have become a critical technology to tackling a wide range of natural language processing tasks, yet many details about how the best-performing language models were developed are not reported.

Language Modelling

766

Paper
Code

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

no code implementations • 30 Jan 2024 • Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi

Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $\infty$-gram LM with backoff.

Language Modelling

Paper
Add Code

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

no code implementations • 19 Jan 2024 • Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer

Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters.

Paper
Add Code

PathFinder: Guided Search over Multi-Step Reasoning Paths

no code implementations • 8 Dec 2023 • Olga Golovneva, Sean O'Brien, Ramakanth Pasunuru, Tianlu Wang, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

Using constrained reasoning, PathFinder integrates novel quality constraints, pruning, and exploration methods to enhance the efficiency and the quality of generation.

Pathfinder

Paper
Add Code

Detecting Pretraining Data from Large Language Models

no code implementations • 25 Oct 2023 • Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer

Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data.

Machine Unlearning

Paper
Add Code

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

1 code implementation • 17 Oct 2023 • Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu

In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.

Language Modelling Large Language Model +2

Paper
Code

In-Context Pretraining: Language Modeling Beyond Document Boundaries

no code implementations • 16 Oct 2023 • Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion.

In-Context Learning Language Modelling +1

Paper
Add Code

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

no code implementations • 2 Oct 2023 • Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih

Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build.

Few-Shot Learning Open-Domain Question Answering +1

Paper
Add Code

Demystifying CLIP Data

2 code implementations • 28 Sep 2023 • Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective.

996

Paper
Code

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

1 code implementation • 5 Sep 2023 • Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan

It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs.

Ranked #2 on Text-to-Image Generation on MS COCO

Language Modelling Retrieval +2

318

Paper
Code

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

1 code implementation • 31 Aug 2023 • Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).

Cross-Lingual Transfer Machine Reading Comprehension +2

303

Paper
Code

The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

1 code implementation • 31 Aug 2023 • Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi, Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer, Pierre Andrews, Marta R. Costa-jussà

We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.

Data Augmentation Text Generation

153

Paper
Code

Self-Alignment with Instruction Backtranslation

2 code implementations • 11 Aug 2023 • Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.

Instruction Following Language Modelling

146

Paper
Code

Shepherd: A Critic for Language Model Generation

1 code implementation • 8 Aug 2023 • Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs.

Language Modelling

197

Paper
Code

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

1 code implementation • 8 Aug 2023 • Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer

SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e. g., containing copyrighted books or news) that is only queried during inference.

Language Modelling Sentence

Paper
Code

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

no code implementations • 31 Jul 2023 • Ari Holtzman, Peter West, Luke Zettlemoyer

Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP and is reshaping how we interact with computers.

Language Modelling Large Language Model

Paper
Add Code

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

no code implementations • 24 May 2023 • Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih

Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations.

Paper
Add Code

Getting MoRE out of Mixture of Language Model Reasoning Experts

no code implementations • 24 May 2023 • Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan Boyd-Graber

Beyond generalizability, the interpretable design of MoRE improves selective question answering results compared to baselines without incorporating inter-expert agreement.

Answer Selection Language Modelling

Paper
Add Code

QLoRA: Efficient Finetuning of Quantized LLMs

12 code implementations • NeurIPS 2023 • Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

10,918

Paper
Code

Revisiting Machine Translation for Cross-lingual Classification

no code implementations • 23 May 2023 • Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer

Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train).

Classification Cross-Lingual Transfer +2

Paper
Add Code

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

5 code implementations • 23 May 2023 • Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly.

Language Modelling Retrieval +1

268

Paper
Code

LIMA: Less Is More for Alignment

5 code implementations • NeurIPS 2023 • Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.

Language Modelling reinforcement-learning

2,496

Paper
Code

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

no code implementations • NeurIPS 2023 • Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books.

Density Estimation Language Modelling

Paper
Add Code

Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models

no code implementations • 26 Apr 2023 • Haoqiang Kang, Terra Blevins, Luke Zettlemoyer

To better understand this contrast, we present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT), an extension of word-level translation that prompts the model to translate a given word in context.

Translation Word Sense Disambiguation

Paper
Add Code

Stable and low-precision training for large-scale vision-language models

1 code implementation • NeurIPS 2023 • Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models.

8,439

Paper
Code

Scaling Expert Language Models with Unsupervised Domain Discovery

1 code implementation • 24 Mar 2023 • Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

Large language models are typically trained densely: all parameters are updated with respect to all inputs.

Language Modelling

104

Paper
Code

ART: Automatic multi-step reasoning and tool-use for large language models

2 code implementations • 16 Mar 2023 • Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, Marco Tulio Ribeiro

We introduce Automatic Reasoning and Tool-use (ART), a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program.

17,300

Paper
Code

Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation

no code implementations • 15 Feb 2023 • Marjan Ghazvininejad, Hila Gonen, Luke Zettlemoyer

Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting, even though they were not explicitly trained for this task.

Machine Translation Translation

Paper
Add Code

Toolformer: Language Models Can Teach Themselves to Use Tools

no code implementations • NeurIPS 2023 • Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale.

Language Modelling

Paper
Add Code

Representation Deficiency in Masked Language Modeling

1 code implementation • 4 Feb 2023 • Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.

Language Modelling Masked Language Modeling

Paper
Code

REPLUG: Retrieval-Augmented Black-Box Language Models

1 code implementation • 30 Jan 2023 • Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.

Ranked #9 on Question Answering on Natural Questions

Language Modelling Multi-task Language Understanding +2

900

Paper
Code

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

2 code implementations • 25 Jan 2023 • Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

Large multilingual language models typically rely on a single vocabulary shared across 100+ languages.

named-entity-recognition Named Entity Recognition +4

Paper
Code

Scaling Laws for Generative Mixed-Modal Language Models

no code implementations • 10 Jan 2023 • Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.

Paper
Add Code

CiT: Curation in Training for Effective Vision-Language Data

1 code implementation • ICCV 2023 • Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford.

Paper
Code

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

1 code implementation • 22 Dec 2022 • Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, Ping Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov

To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks.

Ranked #26 on Natural Language Inference on RTE

Language Modelling Meta-Learning +2

Paper
Code

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

no code implementations • 20 Dec 2022 • Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer

Large language models can perform new tasks in a zero-shot fashion, given natural language prompts that specify the desired behavior.

Paper
Add Code

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

2 code implementations • 20 Dec 2022 • Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun

Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs).

835

Paper
Code

Training Trajectories of Language Models Across Scales

1 code implementation • 19 Dec 2022 • Mengzhou Xia, Mikel Artetxe, Chunting Zhou, Xi Victoria Lin, Ramakanth Pasunuru, Danqi Chen, Luke Zettlemoyer, Ves Stoyanov

Why do larger language models demonstrate more desirable behaviors?

In-Context Learning Multiple-choice

Paper
Code

Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

2 code implementations • 19 Dec 2022 • Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, Hannaneh Hajishirzi

Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available.

Few-Shot Learning In-Context Learning

Paper
Code

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

3 code implementations • 19 Dec 2022 • Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.

Information Retrieval Learning Word Embeddings +3

4,037

Paper
Code

The case for 4-bit precision: k-bit Inference Scaling Laws

1 code implementation • 19 Dec 2022 • Tim Dettmers, Luke Zettlemoyer

Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference latencies.

Quantization

2,913

Paper
Code

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

1 code implementation • 15 Dec 2022 • Olga Golovneva, Moya Chen, Spencer Poff, Martin Corredor, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

Large language models show improved downstream task performance when prompted to generate step-by-step reasoning to justify their final answers.

Informativeness Text Generation

10,426

Paper
Code

Demystifying Prompts in Language Models via Perplexity Estimation

no code implementations • 8 Dec 2022 • Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, Luke Zettlemoyer

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems.

Few-Shot Learning

Paper
Add Code

In-context Examples Selection for Machine Translation

1 code implementation • 5 Dec 2022 • Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad

Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model.

In-Context Learning Language Modelling +2

Paper
Code

AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

1 code implementation • 2 Dec 2022 • Bhargavi Paranjape, Pradeep Dasigi, Vivek Srikumar, Luke Zettlemoyer, Hannaneh Hajishirzi

We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them.

QQP

Paper
Code

Nonparametric Masked Language Modeling

1 code implementation • 2 Dec 2022 • Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases.

Language Modelling Masked Language Modeling +2

153

Paper
Code

CREPE: Open-Domain Question Answering with False Presuppositions

1 code implementation • 30 Nov 2022 • Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer, Hannaneh Hajishirzi

We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.

Open-Domain Question Answering

Paper
Code

Retrieval-Augmented Multimodal Language Modeling

no code implementations • 22 Nov 2022 • Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e. g., documents on the web).

Ranked #7 on Image Captioning on MS COCO

Caption Generation Image Captioning +5

Paper
Add Code

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

1 code implementation • 18 Nov 2022 • Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas.

Code Generation Memorization

187

Paper
Code

Prompting Language Models for Linguistic Structure

no code implementations • 15 Nov 2022 • Terra Blevins, Hila Gonen, Luke Zettlemoyer

Although pretrained language models (PLMs) can be prompted to perform a wide range of language tasks, it remains an open question how much this ability comes from generalizable linguistic understanding versus surface-level lexical patterns.

Chunking In-Context Learning +8

Paper
Add Code

Contrastive Decoding: Open-ended Text Generation as Optimization

2 code implementations • 27 Oct 2022 • Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis

We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint.

Language Modelling Text Generation

160

Paper
Code

RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering

1 code implementation • 25 Oct 2022 • Victor Zhong, Weijia Shi, Wen-tau Yih, Luke Zettlemoyer

Moreover, existing models are not robust to variations in question constraints, but can be made more robust by tuning on clusters of related questions.

Question Answering Retrieval

Paper
Code

M2D2: A Massively Multi-domain Language Modeling Dataset

1 code implementation • 13 Oct 2022 • Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer

We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation in language models (LMs).

Domain Generalization Language Modelling

Paper
Code

CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation

1 code implementation • 10 Oct 2022 • Tanay Dixit, Bhargavi Paranjape, Hannaneh Hajishirzi, Luke Zettlemoyer

We present COunterfactual Generation via Retrieval and Editing (CORE), a retrieval-augmented generation framework for creating diverse counterfactual perturbations for CDA.

counterfactual Data Augmentation +6

Paper
Code

Binding Language Models in Symbolic Languages

1 code implementation • 6 Oct 2022 • Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e. g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations.

Ranked #4 on Table-based Fact Verification on TabFact

Language Modelling Semantic Parsing +1

277

Paper
Code

Improving Policy Learning via Language Dynamics Distillation

1 code implementation • 30 Sep 2022 • Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim Rocktäschel

Recent work has shown that augmenting environments with language descriptions improves policy learning.

NetHack Reinforcement Learning (RL)

Paper
Code

Mega: Moving Average Equipped Gated Attention

5 code implementations • 21 Sep 2022 • Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.

Ranked #1 on Long-range modeling on LRA

Image Classification Inductive Bias +3

124,984

Paper
Code

Selective Annotation Makes Language Models Better Few-Shot Learners

1 code implementation • 5 Sep 2022 • Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time.

Code Generation In-Context Learning +1

Paper
Code

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

3 code implementations • 15 Aug 2022 • Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer

We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.

Ranked #2 on Language Modelling on C4

Language Modelling Linguistic Acceptability +4

546

Paper
Code

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

2 code implementations • 5 Aug 2022 • Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

New ELMs are learned by branching from (mixtures of) ELMs in the current set, further training the parameters on data for the new domain, and then merging the resulting model back into the set for future use.

233

Paper
Code

STOP: A dataset for Spoken Task Oriented Semantic Parsing

1 code implementation • 29 Jun 2022 • Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed

Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

29,251

Paper
Code

Questions Are All You Need to Train a Dense Passage Retriever

1 code implementation • 21 Jun 2022 • Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.

Denoising Language Modelling +1

Paper
Code

LegoNN: Building Modular Encoder-Decoder Models

no code implementations • 7 Jun 2022 • Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

kNN-Prompt: Nearest Neighbor Zero-Shot Inference

1 code implementation • 27 May 2022 • Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer

Retrieval-augmented language models (LMs) use non-parametric memory to substantially outperform their non-retrieval counterparts on perplexity-based evaluations, but it is an open question whether they achieve similar gains in few- and zero-shot end-task accuracy.

Domain Adaptation Language Modelling +6

Paper
Code

Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

no code implementations • 25 May 2022 • Suzanna Sia, Anton Belyy, Amjad Almahairi, Madian Khabsa, Luke Zettlemoyer, Lambert Mathias

Evaluating an explanation's faithfulness is desired for many reasons such as trust, interpretability and diagnosing the sources of model's errors.

counterfactual

Paper
Add Code

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

no code implementations • 24 May 2022 • Terra Blevins, Hila Gonen, Luke Zettlemoyer

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior.

Cross-Lingual Transfer XLM-R

Paper
Add Code

On the Role of Bidirectionality in Language Model Pre-Training

no code implementations • 24 May 2022 • Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov

Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult.

Language Modelling Text Infilling

Paper
Add Code

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

no code implementations • 22 May 2022 • Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan

Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood.

Language Modelling Masked Language Modeling +1

Paper
Add Code

Few-shot Mining of Naturally Occurring Inputs and Outputs

no code implementations • 9 May 2022 • Mandar Joshi, Terra Blevins, Mike Lewis, Daniel S. Weld, Luke Zettlemoyer

Creating labeled natural language training data is expensive and requires significant human effort.

Abstractive Text Summarization Data Augmentation +1

Paper
Add Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Hate Speech Detection Language Modelling +1

6,386

Paper
Code

Natural Language to Code Translation with Execution

1 code implementation • 25 Apr 2022 • Freda Shi, Daniel Fried, Marjan Ghazvininejad, Luke Zettlemoyer, Sida I. Wang

In this work, we introduce execution result--based minimum Bayes risk decoding (MBR-EXEC) for program selection and show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.

Ranked #37 on Code Generation on MBPP

Code Translation Translation

Paper
Code

Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models

no code implementations • 17 Apr 2022 • Terra Blevins, Luke Zettlemoyer

English pretrained language models, which make up the backbone of many modern NLP systems, require huge amounts of unlabeled training data.

Cross-Lingual Transfer

Paper
Add Code

Improving Passage Retrieval with Zero-Shot Question Generation

1 code implementation • 15 Apr 2022 • Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.

Language Modelling Open-Domain Question Answering +6

Paper
Code

InCoder: A Generative Model for Code Infilling and Synthesis

3 code implementations • 12 Apr 2022 • Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.

Ranked #85 on Code Generation on MBPP

Code Generation Comment Generation +1

289

Paper
Code

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

1 code implementation • 3 Apr 2022 • Rabeeh Karimi Mahabadi, Luke Zettlemoyer, James Henderson, Marzieh Saeidi, Lambert Mathias, Veselin Stoyanov, Majid Yazdani

Few-Shot Learning

106

Paper
Code

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

1 code implementation • 25 Feb 2022 • Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.

In-Context Learning

159

Paper
Code

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

no code implementations • 25 Jan 2022 • Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith

Language models increasingly rely on massive web dumps for diverse text data.

Language Modelling

Paper
Add Code

CM3: A Causal Masked Multimodal Model of the Internet

no code implementations • 19 Jan 2022 • Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens.

Entity Disambiguation Entity Linking

Paper
Add Code

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

1 code implementation • 16 Jan 2022 • Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.

Ranked #1 on Task-Oriented Dialogue Systems on KVRET

Few-Shot Learning Question Answering +3

530

Paper
Code

Few-shot Learning with Multilingual Language Models

2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

Large-scale generative language models such as GPT-3 are competitive few-shot learners.

Cross-Lingual Transfer Few-Shot Learning +5

29,254

Paper
Code

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Paper
Add Code

Reducing Target Group Bias in Hate Speech Detectors

no code implementations • 7 Dec 2021 • Darsh J Shah, Sinong Wang, Han Fang, Hao Ma, Luke Zettlemoyer

The ubiquity of offensive and hateful content on online fora necessitates the need for automatic solutions that detect such content competently across target groups.

text-classification Text Classification

Paper
Add Code

Quantifying Adaptability in Pre-trained Language Models with 500 Tasks

2 code implementations • NAACL 2022 • Belinda Z. Li, Jane Yu, Madian Khabsa, Luke Zettlemoyer, Alon Halevy, Jacob Andreas

When a neural language model (LM) is adapted to perform a new task, what aspects of the task predict the eventual performance of the model?

Language Modelling Logical Reasoning +2

Paper
Code

SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark

no code implementations • NeurIPS 2021 • Victor Zhong, Austin Hanjie, Sida Wang, Karthik Narasimhan, Luke Zettlemoyer

We hope SILG enables the community to quickly identify new methodolo- gies for language grounding that generalize to a diverse set of environments and their associated challenges.

Grounded language learning NetHack

Paper
Add Code

BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation

no code implementations • Findings (NAACL) 2022 • Eleftheria Briakou, Sida I. Wang, Luke Zettlemoyer, Marjan Ghazvininejad

Mined bitexts can contain imperfect translations that yield unreliable training signals for Neural Machine Translation (NMT).

Machine Translation NMT +2

Paper
Add Code

MetaICL: Learning to Learn In Context

2 code implementations • NAACL 2022 • Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks.

Few-Shot Learning In-Context Learning +4

239

Paper
Code

SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark

1 code implementation • 20 Oct 2021 • Victor Zhong, Austin W. Hanjie, Sida I. Wang, Karthik Narasimhan, Luke Zettlemoyer

We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.

Grounded language learning NetHack

Paper
Code

8-bit Optimizers via Block-wise Quantization

2 code implementations • ICLR 2022 • Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer

To maintain stability and performance, we combine block-wise quantization with two additional changes: (1) dynamic quantization, a form of non-linear optimization that is precise for both large and small magnitude values, and (2) a stable embedding layer to reduce gradient variance that comes from the highly non-uniform distribution of input tokens in language models.

Language Modelling Machine Translation +1

5,395

Paper
Code

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

2 code implementations • EMNLP 2021 • Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze, Luke Zettlemoyer, Christoph Feichtenhofer

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

Ranked #1 on Temporal Action Localization on CrossTask (using extra training data)

Action Segmentation Long Video Retrieval (Background Removed) +4

29,251

Paper
Code

DEMix Layers: Disentangling Domains for Modular Language Modeling

2 code implementations • NAACL 2022 • Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text.

Language Modelling

Paper
Code

Noisy Channel Language Model Prompting for Few-Shot Text Classification

1 code implementation • ACL 2022 • Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

We introduce a noisy channel approach for language model prompting in few-shot text classification.

Attribute Few-Shot Learning +3

125

Paper
Code

DESCGEN: A Distantly Supervised Datasetfor Generating Entity Descriptions

1 code implementation • ACL 2021 • Weijia Shi, Mandar Joshi, Luke Zettlemoyer

Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering.

Document Summarization Entity Linking +2

Paper
Code

Language Grounding with 3D Objects

2 code implementations • 26 Jul 2021 • Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer

We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation.

Paper
Code

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

no code implementations • ICLR 2022 • Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer

We introduce HTLM, a hyper-text language model trained on a large-scale web crawl.

Ranked #1 on Table-to-Text Generation on DART

Data-to-Text Generation Denoising +2

Paper
Add Code

FaVIQ: FAct Verification from Information-seeking Questions

2 code implementations • ACL 2022 • Jungsoo Park, Sewon Min, Jaewoo Kang, Luke Zettlemoyer, Hannaneh Hajishirzi

Claims in FAVIQ are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.

Fact Checking Fact Verification +1

Paper
Code

Question Answering Infused Pre-training of General-Purpose Contextualized Representations

1 code implementation • Findings (ACL) 2022 • Robin Jia, Mike Lewis, Luke Zettlemoyer

We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context.

named-entity-recognition Named Entity Recognition +3

Paper
Code

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

no code implementations • Findings (ACL) 2021 • Bhargavi Paranjape, Julian Michael, Marjan Ghazvininejad, Luke Zettlemoyer, Hannaneh Hajishirzi

Many commonsense reasoning NLP tasks involve choosing between one or more possible answers to a question or prompt based on knowledge that is often implicit.

Attribute

Paper
Add Code

DESCGEN: A Distantly Supervised Dataset for Generating Abstractive Entity Descriptions

1 code implementation • 9 Jun 2021 • Weijia Shi, Mandar Joshi, Luke Zettlemoyer

Entity Linking Question Answering

Paper
Code

Luna: Linear Unified Nested Attention

2 code implementations • NeurIPS 2021 • Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer

Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length.

Language Modelling Machine Translation +2

102

Paper
Code

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding

1 code implementation • Findings (ACL) 2021 • Hu Xu, Gargi Ghosh, Po-Yao Huang, Prahal Arora, Masoumeh Aminzadeh, Christoph Feichtenhofer, Florian Metze, Luke Zettlemoyer

We present a simplified, task-agnostic multi-modal pre-training approach that can accept either video or text input, or both for a variety of end tasks.

Ranked #2 on Temporal Action Localization on CrossTask (using extra training data)

Action Segmentation Language Modelling +5

29,251

Paper
Code

Surface Form Competition: Why the Highest Probability Answer Isn't Always Right

2 code implementations • 16 Apr 2021 • Ari Holtzman, Peter West, Vered Shwartz, Yejin Choi, Luke Zettlemoyer

Large language models have shown promising results in zero-shot settings (Brown et al., 2020; Radford et al., 2019).

Multiple-choice valid

Paper
Code

BASE Layers: Simplifying Training of Large, Sparse Models

1 code implementation • 30 Mar 2021 • Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, Luke Zettlemoyer

Sparse layers can dramatically improve the efficiency of training and inference by routing each token to specialized expert modules that contain only a small fraction of the model parameters.

29,251

Paper
Code

Multilingual Autoregressive Entity Linking

1 code implementation • 23 Mar 2021 • Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time.

Ranked #2 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking

738

Paper
Code

FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary

no code implementations • EACL 2021 • Terra Blevins, Mandar Joshi, Luke Zettlemoyer

Current models for Word Sense Disambiguation (WSD) struggle to disambiguate rare senses, despite reaching human performance on global WSD metrics.

Transfer Learning Word Sense Disambiguation

Paper
Add Code

Muppet: Massive Multi-task Representations with Pre-Finetuning

2 code implementations • EMNLP 2021 • Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning.

Ranked #3 on Text Summarization on GigaWord (using extra training data)

Abstractive Text Summarization Common Sense Reasoning +7

Paper
Code

Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing

no code implementations • ICLR 2021 • Asish Ghoshal, Xilun Chen, Sonal Gupta, Luke Zettlemoyer, Yashar Mehdad

Training with soft targets instead of hard targets has been shown to improve performance and calibration of deep neural networks.

Generalization Bounds Machine Translation +4

Paper
Add Code

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment

no code implementations • ACL 2021 • Haoyue Shi, Luke Zettlemoyer, Sida I. Wang

Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces.

Bilingual Lexicon Induction Word Alignment

Paper
Add Code

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

2 code implementations • ACL 2021 • Armen Aghajanyan, Luke Zettlemoyer, Sonal Gupta

Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well understood, especially in the low data regime.

Ranked #1 on Transfer Learning on Amazon Review Polarity (Structure Aware Intrinsic Dimension metric)

Generalization Bounds Language Modelling +3

122

Paper
Code

QANom: Question-Answer driven SRL for Nominalizations

1 code implementation • COLING 2020 • Ayal Klein, Jonathan Mamou, Valentina Pyatkin, Daniela Stepanov, Hangfeng He, Dan Roth, Luke Zettlemoyer, Ido Dagan

We propose a new semantic scheme for capturing predicate-argument relations for nominalizations, termed QANom.

Paper
Code

Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Christopher Clark, Mark Yatskar, Luke Zettlemoyer

We evaluate performance on synthetic datasets, and four datasets built to penalize models that exploit known biases on textual entailment, visual question answering, and image recognition tasks.

Natural Language Inference Question Answering +2

Paper
Code

Detecting Hallucinated Content in Conditional Neural Sequence Generation

2 code implementations • Findings (ACL) 2021 • Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad

Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input.

Abstractive Text Summarization Hallucination +1

Paper
Code

Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing

no code implementations • EMNLP 2020 • Xilun Chen, Asish Ghoshal, Yashar Mehdad, Luke Zettlemoyer, Sonal Gupta

Task-oriented semantic parsing is a critical component of virtual assistants, which is responsible for understanding the user's intents (set reminder, play music, etc.).

Domain Adaptation Meta-Learning +2

Paper
Add Code

Nearest Neighbor Machine Translation

5 code implementations • ICLR 2021 • Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis

We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a neural translation model for similarity search.

Machine Translation Translation

301

Paper
Code

Grounded Adaptation for Zero-shot Executable Semantic Parsing

1 code implementation • EMNLP 2020 • Victor Zhong, Mike Lewis, Sida I. Wang, Luke Zettlemoyer

We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e. g. new database schemas).

Ranked #6 on Text-To-SQL on SParC

Data Augmentation Dialogue State Tracking +2

Paper
Code

Better Fine-Tuning by Reducing Representational Collapse

3 code implementations • ICLR 2021 • Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods.

Ranked #1 on Cross-Lingual Natural Language Inference on XNLI Zero-Shot English-to-Spanish

Abstractive Text Summarization Cross-Lingual Natural Language Inference

29,249

Paper
Code

DeLighT: Deep and Light-weight Transformer

2 code implementations • ICLR 2021 • Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce a deep and light-weight transformer, DeLighT, that delivers similar or better performance than standard transformer-based models with significantly fewer parameters.

Ranked #1 on Machine Translation on WMT2016 English-French

Language Modelling Machine Translation +1

461

Paper
Code

Simple and Effective Retrieve-Edit-Rerank Text Generation

no code implementations • ACL 2020 • Nabil Hossain, Marjan Ghazvininejad, Luke Zettlemoyer

Retrieve-and-edit seq2seq methods typically retrieve an output from the training set and learn a model to edit it to produce the final output.

Machine Translation Re-Ranking +2

Paper
Add Code

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders

no code implementations • ACL 2020 • Terra Blevins, Luke Zettlemoyer

A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed, causing existing models to generally perform poorly on senses that are either rare or unseen during training.

Ranked #9 on Word Sense Disambiguation on Supervised:

Word Sense Disambiguation

Paper
Add Code

Pre-training via Paraphrasing

2 code implementations • NeurIPS 2020 • Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.

Document Summarization Document Translation +6

Paper
Code

Moving Down the Long Tail of Word Sense Disambiguation with Gloss-Informed Biencoders

1 code implementation • 6 May 2020 • Terra Blevins, Luke Zettlemoyer

Word Sense Disambiguation

Paper
Code

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

2 code implementations • EMNLP 2020 • Bhargavi Paranjape, Mandar Joshi, John Thickstun, Hannaneh Hajishirzi, Luke Zettlemoyer

Decisions of complex language understanding models can be rationalized by limiting their inputs to a relevant subsequence of the original text.

Paper
Code

Active Learning for Coreference Resolution using Discrete Annotation

1 code implementation • ACL 2020 • Belinda Z. Li, Gabriel Stanovsky, Luke Zettlemoyer

We improve upon pairwise annotation for active learning in coreference resolution, by asking annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.

Active Learning Clustering +1

Paper
Code

AmbigQA: Answering Ambiguous Open-domain Questions

2 code implementations • EMNLP 2020 • Sewon Min, Julian Michael, Hannaneh Hajishirzi, Luke Zettlemoyer

Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer.

Open-Domain Question Answering Weakly-supervised Learning

116

Paper
Code

Aligned Cross Entropy for Non-Autoregressive Machine Translation

1 code implementation • ICML 2020 • Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order.

Machine Translation Translation

Paper
Code

Semi-Autoregressive Training Improves Mask-Predict Decoding

no code implementations • 23 Jan 2020 • Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer

The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach.

Machine Translation Translation

Paper
Add Code

Multilingual Denoising Pre-training for Neural Machine Translation

5 code implementations • 22 Jan 2020 • Yinhan Liu, Jiatao Gu, Naman Goyal, Xi-An Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks.

Denoising Sentence +2

124,984

Paper
Code

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

7 code implementations • CVPR 2020 • Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

332

Paper
Code

Scalable Zero-shot Entity Linking with Dense Entity Retrieval

3 code implementations • EMNLP 2020 • Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, Luke Zettlemoyer

This paper introduces a conceptually simple, scalable, and highly effective BERT-based entity linking model, along with an extensive evaluation of its accuracy-speed trade-off.

Entity Embeddings Entity Linking +3

1,132

Paper
Code

Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering

7 code implementations • 10 Nov 2019 • Sewon Min, Danqi Chen, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from an external knowledge base or co-occurrence in the same article.

Natural Questions Open-Domain Question Answering +5

124,984

Paper
Code

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

no code implementations • 9 Nov 2019 • Siddharth Dalmia, Abdel-rahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer

Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability.

Paper
Add Code

Controlled Crowdsourcing for High-Quality QA-SRL Annotation

1 code implementation • ACL 2020 • Paul Roit, Ayal Klein, Daniela Stepanov, Jonathan Mamou, Julian Michael, Gabriel Stanovsky, Luke Zettlemoyer, Ido Dagan

Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen.

Semantic Role Labeling Vocal Bursts Intensity Prediction

Paper
Code

Unsupervised Cross-lingual Representation Learning at Scale

27 code implementations • ACL 2020 • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Multilingual NLP +2

124,984

Paper
Code

Emerging Cross-lingual Structure in Pretrained Language Models

no code implementations • ACL 2020 • Shijie Wu, Alexis Conneau, Haoran Li, Luke Zettlemoyer, Veselin Stoyanov

We study the problem of multilingual masked language modeling, i. e. the training of a single model on concatenated text from multiple languages, and present a detailed study of several factors that influence why these models are so effective for cross-lingual transfer.

Cross-Lingual Transfer Language Modelling +2

Paper
Add Code

Span-based Hierarchical Semantic Parsing for Task-Oriented Dialog

no code implementations • IJCNLP 2019 • Panupong Pasupat, Sonal Gupta, M, Karishma yam, Rushin Shah, Mike Lewis, Luke Zettlemoyer

We propose a semantic parser for parsing compositional utterances into Task Oriented Parse (TOP), a tree representation that has intents and slots as labels of nesting tree nodes.

Semantic Parsing valid

Paper
Add Code

Generalization through Memorization: Nearest Neighbor Language Models

5 code implementations • ICLR 2020 • Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis

Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15. 79 - a 2. 9 point improvement with no additional training.

Ranked #10 on Language Modelling on WikiText-103

Domain Adaptation Language Modelling +1

48,096

Paper
Code

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

43 code implementations • ACL 2020 • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel-rahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.

Ranked #3 on Open-Domain Question Answering on ELI5

Abstractive Text Summarization Denoising +5

124,984

Paper
Code

JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

2 code implementations • IJCNLP 2019 • Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer

Interactive programming with interleaved code snippet cells and natural language markdown is recently gaining popularity in the form of Jupyter notebooks, which accelerate prototyping and collaboration.

Code Generation

Paper
Code

A Discrete Hard EM Approach for Weakly Supervised Question Answering

1 code implementation • IJCNLP 2019 • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer

Many question answering (QA) tasks only provide weak supervision for how the answer should be computed.

Ranked #2 on Question Answering on NarrativeQA

Question Answering TriviaQA

136

Paper
Code

Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases

3 code implementations • IJCNLP 2019 • Christopher Clark, Mark Yatskar, Luke Zettlemoyer

Our method has two stages: we (1) train a naive model that makes predictions exclusively based on dataset biases, and (2) train a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize.

Ranked #5 on Visual Question Answering (VQA) on VQA-CP

Natural Language Inference Question Answering +1

Paper
Code

BERT for Coreference Resolution: Baselines and Analysis

2 code implementations • IJCNLP 2019 • Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer

We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3. 9 F1) and GAP (+11. 5 F1) benchmarks.

Ranked #10 on Coreference Resolution on CoNLL 2012 (using extra training data)

436

Paper
Code

RoBERTa: A Robustly Optimized BERT Pretraining Approach

59 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Ranked #1 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (Wasserstein Distance (WD) metric, using extra training data)

Document Image Classification Language Modelling +13

124,984

Paper
Code

SpanBERT: Improving Pre-training by Representing and Predicting Spans

6 code implementations • TACL 2020 • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text.

Ranked #1 on Question Answering on NewsQA (F1 metric)

Linguistic Acceptability Natural Language Inference +4

875

Paper
Code

Vision-and-Dialog Navigation

2 code implementations • 10 Jul 2019 • Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer

To train agents that search an environment for a goal location, we define the Navigation from Dialog History task.

Ranked #19 on Visual Navigation on Cooperative Vision-and-Dialogue Navigation

2k Visual Navigation

Paper
Code

Sparse Networks from Scratch: Faster Training without Losing Performance

2 code implementations • ICLR 2020 • Tim Dettmers, Luke Zettlemoyer

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels.

Ranked #68 on Image Classification on MNIST

Image Classification Sparse Learning

374

Paper
Code

E3: Entailment-driven Extracting and Editing for Conversational Machine Reading

1 code implementation • ACL 2019 • Victor Zhong, Luke Zettlemoyer

Conversational machine reading systems help users answer high-level questions (e. g. determine if they qualify for particular government benefits) when they do not know the exact rules by which the determination is made(e. g. whether they need certain income levels or veteran status).

Reading Comprehension

Paper
Code

Multi-hop Reading Comprehension through Question Decomposition and Rescoring

2 code implementations • ACL 2019 • Sewon Min, Victor Zhong, Luke Zettlemoyer, Hannaneh Hajishirzi

Multi-hop Reading Comprehension (RC) requires reasoning and aggregation across several paragraphs.

Ranked #65 on Question Answering on HotpotQA

Decision Making Multi-Hop Reading Comprehension +2

137

Paper
Code

Compositional Questions Do Not Necessitate Multi-hop Reasoning

1 code implementation • ACL 2019 • Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer

Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs.

Information Retrieval Multi-Hop Reading Comprehension +1

Paper
Code

Better Character Language Modeling Through Morphology

no code implementations • ACL 2019 • Terra Blevins, Luke Zettlemoyer

We incorporate morphological supervision into character language models (CLMs) via multitasking and show that this addition improves bits-per-character (BPC) performance across 24 languages, even when the morphology data and language modeling data are disjoint.

Language Modelling

Paper
Add Code

Evaluating Gender Bias in Machine Translation

1 code implementation • ACL 2019 • Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer

We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT).

coreference-resolution Machine Translation +2

Paper
Code

Iterative Search for Weakly Supervised Semantic Parsing

no code implementations • NAACL 2019 • Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy

Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical forms that coincidentally evaluate to the correct answer.

Semantic Parsing Visual Reasoning

Paper
Add Code

Transformers with convolutional context for ASR

4 code implementations • 26 Apr 2019 • Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer

The recent success of transformer networks for neural machine translation and other NLP tasks has led to a surge in research work trying to apply it for speech recognition.

Image Classification Machine Translation +3

Paper
Code

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

2 code implementations • IJCNLP 2019 • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer

Most machine translation systems generate text autoregressively from left to right.

Language Modelling Machine Translation +2

238

Paper
Code

Learning Programmatic Idioms for Scalable Semantic Parsing

no code implementations • IJCNLP 2019 • Srinivasan Iyer, Alvin Cheung, Luke Zettlemoyer

Programmers typically organize executable source code using high-level coding patterns or idiomatic structures such as nested loops, exception handlers and recursive blocks, rather than as individual code tokens.

Code Generation Semantic Parsing

Paper
Add Code

Cloze-driven Pretraining of Self-attention Networks

no code implementations • IJCNLP 2019 • Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.

Ranked #10 on Constituency Parsing on Penn Treebank

Constituency Parsing NER +2

Paper
Add Code

Improving Semantic Parsing for Task Oriented Dialog

no code implementations • 15 Feb 2019 • Arash Einolghozati, Panupong Pasupat, Sonal Gupta, Rushin Shah, Mrinal Mohit, Mike Lewis, Luke Zettlemoyer

Semantic parsing using hierarchical representations has recently been proposed for task oriented dialog with promising results [Gupta et al 2018].

Language Modelling Re-Ranking +1

Paper
Add Code

The Referential Reader: A Recurrent Entity Network for Anaphora Resolution

1 code implementation • ACL 2019 • Fei Liu, Luke Zettlemoyer, Jacob Eisenstein

We present a new architecture for storing and accessing entity mentions during online text processing.

Language Modelling

Paper
Code

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

3 code implementations • NAACL 2019 • Mandar Joshi, Eunsol Choi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer

Reasoning about implied relationships (e. g., paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems.

Common Sense Reasoning Sentence +1

Paper
Code

QuAC: Question Answering in Context

no code implementations • EMNLP 2018 • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).

Question Answering Reading Comprehension

Paper
Add Code

Syntactic Scaffolds for Semantic Structures

1 code implementation • EMNLP 2018 • Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith

We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks.

coreference-resolution

Paper
Code

Neural Metaphor Detection in Context

1 code implementation • EMNLP 2018 • Ge Gao, Eunsol Choi, Yejin Choi, Luke Zettlemoyer

We present end-to-end neural models for detecting metaphorical word use in context.

Paper
Code

Mapping Language to Code in Programmatic Context

1 code implementation • EMNLP 2018 • Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer

To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class.

Paper
Code

Dissecting Contextual Word Embeddings: Architecture and Representation

no code implementations • EMNLP 2018 • Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, Wen-tau Yih

Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks.

Word Embeddings

Paper
Add Code

QuAC : Question Answering in Context

no code implementations • 21 Aug 2018 • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).

Question Answering Reading Comprehension

Paper
Add Code

Ultra-Fine Entity Typing

1 code implementation • ACL 2018 • Eunsol Choi, Omer Levy, Yejin Choi, Luke Zettlemoyer

We introduce a new entity typing task: given a sentence with an entity mention, the goal is to predict a set of free-form phrases (e. g. skyscraper, songwriter, or criminal) that describe appropriate types for the target entity.

Ranked #4 on Entity Typing on Ontonotes v5 (English)

Entity Linking Entity Typing +1

Paper
Code

Neural Semantic Parsing

no code implementations • ACL 2018 • Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer

Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.

Code Generation Instruction Following +4

Paper
Add Code

Supervised Open Information Extraction

no code implementations • NAACL 2018 • Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, Ido Dagan

We present data and methods that enable a supervised learning approach to Open Information Extraction (Open IE).

Knowledge Base Population Natural Language Inference +3

Paper
Add Code

Large-Scale QA-SRL Parsing

3 code implementations • ACL 2018 • Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer

We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser.

Semantic Role Labeling

Paper
Code

Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling

1 code implementation • ACL 2018 • Luheng He, Kenton Lee, Omer Levy, Luke Zettlemoyer

Recent BIO-tagging-based neural semantic role labeling models are very high performing, but assume gold predicates as part of the input and cannot incorporate span-level features.

Ranked #2 on Semantic Role Labeling (predicted predicates) on CoNLL 2005

Semantic Role Labeling

118

Paper
Code

Deep RNNs Encode Soft Hierarchical Syntax

no code implementations • ACL 2018 • Terra Blevins, Omer Levy, Luke Zettlemoyer

We present a set of experiments to demonstrate that deep recurrent neural networks (RNNs) learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision.

Dependency Parsing Language Modelling +3

Paper
Add Code

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

no code implementations • ACL 2018 • Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer

LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections.

Paper
Add Code

SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach

no code implementations • EMNLP 2018 • Michael Petrochuk, Luke Zettlemoyer

In this paper, we present new evidence that this benchmark can be nearly solved by standard methods.

Paper
Add Code

Adversarial Example Generation with Syntactically Controlled Paraphrase Networks

2 code implementations • NAACL 2018 • Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer

We propose syntactically controlled paraphrase networks (SCPNs) and use them to generate adversarial examples.

Sentence

167

Paper
Code

Higher-order Coreference Resolution with Coarse-to-fine Inference

5 code implementations • NAACL 2018 • Kenton Lee, Luheng He, Luke Zettlemoyer

We introduce a fully differentiable approximation to higher-order inference for coreference resolution.

Ranked #14 on Coreference Resolution on CoNLL 2012

519

Paper
Code

AllenNLP: A Deep Semantic Natural Language Processing Platform

1 code implementation • WS 2018 • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer

This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding.

Natural Language Understanding Reading Comprehension +1

11,692

Paper
Code

NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

3 code implementations • LREC 2018 • Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, Michael D. Ernst

We present new data and semantic parsing methods for the problem of mapping English sentences to Bash commands (NL2Bash).

Semantic Parsing

435

Paper
Code

Deep contextualized word representations

46 code implementations • NAACL 2018 • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).

Ranked #3 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (Wasserstein Distance (WD) metric, using extra training data)

Citation Intent Classification Conversational Response Selection +8

13,565

Paper
Code

Crowdsourcing Question-Answer Meaning Representations

1 code implementation • NAACL 2018 • Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke Zettlemoyer

We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs.

Sentence

Paper
Code

End-to-end Neural Coreference Resolution

4 code implementations • EMNLP 2017 • Kenton Lee, Luheng He, Mike Lewis, Luke Zettlemoyer

We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector.

Ranked #15 on Coreference Resolution on CoNLL 2012

519

Paper
Code

Deep Semantic Role Labeling: What Works and What's Next

1 code implementation • ACL 2017 • Luheng He, Kenton Lee, Mike Lewis, Luke Zettlemoyer

We introduce a new deep learning model for semantic role labeling (SRL) that significantly improves the state of the art, along with detailed analyses to reveal its strengths and limitations.

Ranked #2 on Predicate Detection on CoNLL 2005

Predicate Detection

330

Paper
Code

Zero-Shot Relation Extraction via Reading Comprehension

2 code implementations • CONLL 2017 • Omer Levy, Minjoon Seo, Eunsol Choi, Luke Zettlemoyer

We show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot.

Reading Comprehension Relation +5

Paper
Code

Recurrent Additive Networks

2 code implementations • 21 May 2017 • Kenton Lee, Omer Levy, Luke Zettlemoyer

We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates.

Language Modelling

Paper
Code

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

3 code implementations • ACL 2017 • Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer

We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples.

Reading Comprehension Sentence +1

255

Paper
Code

Learning a Neural Semantic Parser from User Feedback

no code implementations • ACL 2017 • Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention.

Ranked #1 on SQL Parsing on Restaurants

SQL Parsing

Paper
Add Code

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

5 code implementations • ACL 2017 • Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer

Sequence-to-sequence models have shown strong performance across a broad range of applications.

Ranked #6 on AMR Parsing on LDC2015E86

AMR Parsing Graph-to-Sequence

139

Paper
Code

Commonly Uncommon: Semantic Sparsity in Situation Recognition

2 code implementations • CVPR 2017 • Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi

Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set.

Ranked #11 on Situation Recognition on imSitu