Search Results for author: Tianle Cai

Found 28 papers, 19 papers with code

SnapKV: LLM Knows What You are Looking for Before Generation

1 code implementation22 Apr 2024 Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.

16k

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

1 code implementation11 Apr 2024 Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.

Accelerating Greedy Coordinate Gradient via Probe Sampling

1 code implementation2 Mar 2024 Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

Safety of Large Language Models (LLMs) has become a central issue given their rapid progress and wide applications.

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

1 code implementation29 Feb 2024 Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.

BitDelta: Your Fine-Tune May Only Be Worth One Bit

1 code implementation15 Feb 2024 James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

1 code implementation19 Jan 2024 Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.

REST: Retrieval-Based Speculative Decoding

1 code implementation14 Nov 2023 Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.

Language Modelling Retrieval +1

Scaling In-Context Demonstrations with Structured Attention

no code implementations5 Jul 2023 Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations.

In-Context Learning Sentence

Reward Collapse in Aligning Large Language Models

1 code implementation28 May 2023 Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime.

Large Language Models as Tool Makers

1 code implementation26 May 2023 Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou

Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks.

What Makes Convolutional Models Great on Long Sequence Modeling?

1 code implementation17 Oct 2022 Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey

We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.

Long-range modeling

Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond

no code implementations19 Jul 2022 Yuzheng Hu, Tianle Cai, Jinyong Shan, Shange Tang, Chaochao Cai, Ethan Song, Bo Li, Dawn Song

We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared.

Philosophy Privacy Preserving +2

Do Transformers Really Perform Badly for Graph Representation?

no code implementations NeurIPS 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Representation Learning

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations NeurIPS 2021 Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

4 code implementations15 Jun 2021 Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations9 Jun 2021 Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Classification Graph Property Prediction +2

Towards a Theoretical Framework of Out-of-Distribution Generalization

no code implementations NeurIPS 2021 Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, LiWei Wang

We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features.

Domain Generalization Model Selection +1

A Theory of Label Propagation for Subpopulation Shift

no code implementations22 Feb 2021 Tianle Cai, Ruiqi Gao, Jason D. Lee, Qi Lei

In this work, we propose a provably effective framework for domain adaptation based on label propagation.

Domain Adaptation Generalization Bounds

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

2 code implementations10 Feb 2021 Bohang Zhang, Tianle Cai, Zhou Lu, Di He, LiWei Wang

This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs.

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

1 code implementation NeurIPS 2020 Jingtong Su, Yihang Chen, Tianle Cai, Tianhao Wu, Ruiqi Gao, Li-Wei Wang, Jason D. Lee

In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance.

Network Pruning

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

1 code implementation7 Sep 2020 Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang

We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.

Graph Classification Graph Representation Learning

Defective Convolutional Networks

1 code implementation19 Nov 2019 Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang

Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.

Defective Convolutional Layers Learn Robust CNNs

no code implementations25 Sep 2019 Tiange Luo, Tianle Cai, Xiaomeng Zhang, Siyu Chen, Di He, LiWei Wang

We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes.

Convergence of Adversarial Training in Overparametrized Neural Networks

no code implementations NeurIPS 2019 Ruiqi Gao, Tianle Cai, Haochuan Li, Li-Wei Wang, Cho-Jui Hsieh, Jason D. Lee

Neural networks are vulnerable to adversarial examples, i. e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network.

Adversarially Robust Generalization Just Requires More Unlabeled Data

1 code implementation3 Jun 2019 Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Li-Wei Wang

Neural network robustness has recently been highlighted by the existence of adversarial examples.

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

no code implementations28 May 2019 Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.

regression Second-order methods

Cannot find the paper you are looking for? You can Submit a new open access paper.