1 code implementation • 22 Apr 2024 • Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen
Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.
1 code implementation • 11 Apr 2024 • Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin
Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.
1 code implementation • 2 Mar 2024 • Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh
Safety of Large Language Models (LLMs) has become a central issue given their rapid progress and wide applications.
1 code implementation • 29 Feb 2024 • Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han
To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.
1 code implementation • 15 Feb 2024 • James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai
Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.
1 code implementation • 19 Jan 2024 • Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao
We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.
1 code implementation • 14 Nov 2023 • Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He
We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.
no code implementations • 5 Jul 2023 • Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang
However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations.
1 code implementation • 28 May 2023 • Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su
This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime.
1 code implementation • 26 May 2023 • Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou
Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks.
1 code implementation • 17 Oct 2022 • Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey
We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.
Ranked #6 on Long-range modeling on LRA
no code implementations • 19 Jul 2022 • Yuzheng Hu, Tianle Cai, Jinyong Shan, Shange Tang, Chaochao Cai, Ethan Song, Bo Li, Dawn Song
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared.
no code implementations • NeurIPS 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu
Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.
4 code implementations • 15 Jun 2021 • Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He
In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.
4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
Ranked #1 on Graph Regression on PCQM4M-LSC
no code implementations • NeurIPS 2021 • Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, LiWei Wang
We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features.
no code implementations • 22 Feb 2021 • Tianle Cai, Ruiqi Gao, Jason D. Lee, Qi Lei
In this work, we propose a provably effective framework for domain adaptation based on label propagation.
2 code implementations • 10 Feb 2021 • Bohang Zhang, Tianle Cai, Zhou Lu, Di He, LiWei Wang
This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs.
1 code implementation • NeurIPS 2020 • Jingtong Su, Yihang Chen, Tianle Cai, Tianhao Wu, Ruiqi Gao, Li-Wei Wang, Jason D. Lee
In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance.
1 code implementation • 7 Sep 2020 • Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang
We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.
Ranked #25 on Graph Property Prediction on ogbg-molhiv
1 code implementation • ICLR 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Li-Wei Wang
We next investigate the adversarial examples which 'fool' a CNN with Random Mask.
3 code implementations • NeurIPS 2020 • Kai Zheng, Tianle Cai, Weiran Huang, Zhenguo Li, Li-Wei Wang
We study locally differentially private (LDP) bandits learning in this paper.
1 code implementation • 19 Nov 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang
Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.
no code implementations • 25 Sep 2019 • Tiange Luo, Tianle Cai, Xiaomeng Zhang, Siyu Chen, Di He, LiWei Wang
We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes.
no code implementations • NeurIPS 2019 • Ruiqi Gao, Tianle Cai, Haochuan Li, Li-Wei Wang, Cho-Jui Hsieh, Jason D. Lee
Neural networks are vulnerable to adversarial examples, i. e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network.
1 code implementation • 3 Jun 2019 • Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Li-Wei Wang
Neural network robustness has recently been highlighted by the existence of adversarial examples.
no code implementations • 28 May 2019 • Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang
First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.