Search Results for author: Qingqing Cao

Found 14 papers, 9 papers with code

IrEne-viz: Visualizing Energy Consumption of Transformer Models

1 code implementation • EMNLP (ACL) 2021 • Yash Kumar Lal, Reetu Singh, Harsh Trivedi, Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian

IrEne is an energy prediction system that accurately predicts the interpretable inference energy consumption of a wide range of Transformer-based NLP models.

302

Paper
Code

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

1 code implementation • 22 Apr 2024 • Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

To this end, we release OpenELM, a state-of-the-art open language model.

Language Modelling

1,692

Paper
Code

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

no code implementations • 22 Jan 2024 • Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao

Compared to baselines, our experiments show that APT maintains up to 98% task performance when pruning RoBERTa and T5 models with 40% parameters left while keeping 86. 4% LLaMA models' performance with 70% parameters remained.

Paper
Add Code

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

1 code implementation • 2 Oct 2023 • Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks.

Hallucination Retrieval

Paper
Code

Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

no code implementations • 19 Jul 2023 • Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi

In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency.

Paper
Add Code

AdANNS: A Framework for Adaptive Semantic Search

1 code implementation • NeurIPS 2023 • Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi

Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations.

Natural Questions Quantization +1

Paper
Code

PuMer: Pruning and Merging Tokens for Efficient Vision Language Models

1 code implementation • 27 May 2023 • Qingqing Cao, Bhargavi Paranjape, Hannaneh Hajishirzi

Large-scale vision language (VL) models use Transformers to perform cross-modal interactions between the input text and image.

Token Reduction

Paper
Code

A Survey for Efficient Open Domain Question Answering

no code implementations • 15 Nov 2022 • Qin Zhang, Shangsi Chen, Dongkuan Xu, Qingqing Cao, Xiaojun Chen, Trevor Cohn, Meng Fang

Thus, a trade-off between accuracy, memory consumption and processing speed is pursued.

Open-Domain Question Answering

Paper
Add Code

Efficient Methods for Natural Language Processing: A Survey

no code implementations • 31 Aug 2022 • Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows.

Information Retrieval Open-Domain Question Answering

Paper
Add Code

IrEne: Interpretable Energy Prediction for Transformers

1 code implementation • ACL 2021 • Qingqing Cao, Yash Kumar Lal, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

We present IrEne, an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models.

Paper
Code

Bew: Towards Answering Business-Entity-Related Web Questions

no code implementations • 10 Dec 2020 • Qingqing Cao, Oriana Riva, Aruna Balasubramanian, Niranjan Balasubramanian

We present a practical approach, called BewQA, that can answer Bew queries by mining a template of the business-related webpages and using the template to guide the search.

Paper
Add Code

Towards Accurate and Reliable Energy Measurement of NLP Models

1 code implementation • EMNLP (sustainlp) 2020 • Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian

In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption.

Question Answering