Search Results for author: Deming Chen

Found 45 papers, 22 papers with code

SnapKV: LLM Knows What You are Looking for Before Generation

1 code implementation • 22 Apr 2024 • Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.

16k

Paper
Code

On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models

2 code implementations • 4 Apr 2024 • Sean Farhat, Deming Chen

We observe that, when distilled on a task from a pre-trained teacher model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task.

Contrastive Learning Knowledge Distillation

152

Paper
Code

FedCore: Straggler-Free Federated Learning with Distributed Coresets

1 code implementation • 31 Jan 2024 • Hongpeng Guo, Haotian Gu, Xiaoyang Wang, Bo Chen, Eun Kyung Lee, Tamar Eilam, Deming Chen, Klara Nahrstedt

Federated learning (FL) is a machine learning paradigm that allows multiple clients to collaboratively train a shared model while keeping their data on-premise.

Federated Learning

Paper
Code

Subgraph Extraction-based Feedback-guided Iterative Scheduling for HLS

no code implementations • 22 Jan 2024 • Hanchen Ye, David Z. Pan, Chris Leary, Deming Chen, Xiaoqing Xu

This paper proposes ISDC, a novel feedback-guided iterative system of difference constraints (SDC) scheduling algorithm for high-level synthesis (HLS).

Scheduling

Paper
Add Code

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

1 code implementation • 19 Jan 2024 • Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.

1,856

Paper
Code

Extensible and Efficient Proxy for Neural Architecture Search

no code implementations • ICCV 2023 • Yuhong Li, Jiajie Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

We further propose a Discrete Proxy Search (DPS) method to find the optimized training settings for Eproxy with only a handful of benchmarked architectures on the target tasks.

Neural Architecture Search

Paper
Add Code

What Makes Convolutional Models Great on Long Sequence Modeling?

1 code implementation • 17 Oct 2022 • Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey

We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.

Ranked #6 on Long-range modeling on LRA

Long-range modeling

158

Paper
Code

Extensible Proxy for Efficient NAS

1 code implementation • 17 Oct 2022 • Yuhong Li, Jiajie Li, Cong Han, Pan Li, JinJun Xiong, Deming Chen

(2) Efficient proxies are not extensible to multi-modality downstream tasks.

Neural Architecture Search

Paper
Code

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation

no code implementations • 22 Jul 2022 • Yao Chen, Junhao Pan, Xinheng Liu, JinJun Xiong, Deming Chen

In this study, we propose HiKonv, a unified solution that maximizes the throughput of convolution on a given underlying processing unit with low-bitwidth quantized data inputs through novel bit-wise management and parallel computation.

Management Quantization

Paper
Add Code

ORB-based SLAM accelerator on SoC FPGA

no code implementations • 18 Jul 2022 • Vibhakar Vemulapati, Deming Chen

Simultaneous Localization and Mapping (SLAM) is one of the main components of autonomous navigation systems.

Autonomous Navigation Simultaneous Localization and Mapping

Paper
Add Code

Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis

no code implementations • 3 Jul 2022 • Mang Yu, Sitao Huang, Deming Chen

However, with the high flexibility comes the difficulty in design and optimization.

Active Learning Descriptive +1

Paper
Add Code

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

no code implementations • 6 Jun 2022 • Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc.

BIG-bench Machine Learning

Paper
Add Code

Physics Community Needs, Tools, and Resources for Machine Learning

no code implementations • 30 Mar 2022 • Philip Harris, Erik Katsavounidis, William Patrick McCormack, Dylan Rankin, Yongbin Feng, Abhijith Gandrakota, Christian Herwig, Burt Holzman, Kevin Pedro, Nhan Tran, Tingjun Yang, Jennifer Ngadiuba, Michael Coughlin, Scott Hauck, Shih-Chieh Hsu, Elham E Khoda, Deming Chen, Mark Neubauer, Javier Duarte, Georgia Karagiorgi, Mia Liu

Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges.

BIG-bench Machine Learning

Paper
Add Code

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models

no code implementations • 21 Jan 2022 • Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang

By evaluating on SQuAD, a model found by AutoDistill achieves an 88. 4% F1 score with 22. 8M parameters, which reduces parameters by more than 62% while maintaining higher accuracy than DistillBERT, TinyBERT, and NAS-BERT.

Bayesian Optimization Knowledge Distillation +2

Paper
Add Code

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation

no code implementations • 28 Dec 2021 • Xinheng Liu, Yao Chen, Prakhar Ganesh, Junhao Pan, JinJun Xiong, Deming Chen

Quantization for Convolutional Neural Network (CNN) has shown significant progress with the intention of reducing the cost of computation and storage with low-bitwidth data inputs.

Management Quantization

Paper
Add Code

EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search

1 code implementation • 24 Nov 2021 • Qian Jiang, Xiaofan Zhang, Deming Chen, Minh N. Do, Raymond A. Yeh

In this work, we propose End-to-end Hardware-aware DNAS (EH-DNAS), a seamless integration of end-to-end hardware benchmarking, and fully automated DNAS to deliver hardware-efficient deep neural networks on various platforms, including Edge GPUs, Edge TPUs, Mobile CPUs, and customized accelerators.

Benchmarking Neural Architecture Search

Paper
Code

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

1 code implementation • 26 Oct 2021 • Prakhar Ganesh, Yao Chen, Yin Yang, Deming Chen, Marianne Winslett

Performance of object detection models has been growing rapidly on two major fronts, model accuracy and efficiency.

object-detection Real-Time Object Detection +1

Paper
Code

Generic Neural Architecture Search via Regression

2 code implementations • NeurIPS 2021 • Yuhong Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples.

Ranked #1 on Neural Architecture Search on NAS-Bench-101 (Spearman Correlation metric)

Image Classification Neural Architecture Search +1

Paper
Code

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

1 code implementation • 9 Jul 2021 • Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen

We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency.

Paper
Code

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

no code implementations • 8 Apr 2021 • Cong Hao, Deming Chen

We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency.

Multi-Task Learning Sensor Fusion

Paper
Add Code

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

no code implementations • 25 Mar 2021 • Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.

Benchmarking Edge-computing

Paper
Add Code

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding

no code implementations • 8 Mar 2021 • Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences.

Paper
Add Code

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

1 code implementation • 4 Mar 2021 • Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help.

Recommendation Systems

Paper
Code

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

1 code implementation • 20 Jan 2021 • Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step.

Paper
Code

TwinDNN: A Tale of Two Deep Neural Networks

no code implementations • 1 Jan 2021 • Hyunmin Jeong, Deming Chen

This is the first work that considers using a highly compressed DNN along with the original DNN in parallel to improve latency significantly while effectively maintaining the original model accuracy.

Image Classification Model Compression +2

Paper
Add Code

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation • 1 Jan 2021 • Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

Paper
Code

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations

2 code implementations • 22 Dec 2020 • Yichi Zhang, Junhao Pan, Xinheng Liu, Hongzheng Chen, Deming Chen, Zhiru Zhang

We design an efficient FPGA-based accelerator for our novel BNN model that supports the fractional activations.

Binarization Image Classification

Paper
Code

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations • 14 Oct 2020 • Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

Paper
Add Code

Comprehensive assessment of error correction methods for high-throughput sequencing data

no code implementations • 10 Jul 2020 • Yun Heo, Gowthami Manikandan, Anand Ramachandran, Deming Chen

We also present a compilation of sequencing datasets for Illumina, PacBio and ONT platforms that present challenging scenarios for error-correction tools.

Vocal Bursts Intensity Prediction

Paper
Add Code

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

1 code implementation • 18 May 2020 • Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs.

Model Compression object-detection +2

Paper
Code

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations • 6 May 2020 • Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

Paper
Add Code

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

no code implementations • 8 Apr 2020 • Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, Deming Chen

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations.

Paper
Add Code

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

no code implementations • 27 Feb 2020 • Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, Marianne Winslett

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.

Model Compression

Paper
Add Code

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation • 6 Jan 2020 • Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

Paper
Code

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations • 18 Nov 2019 • Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

Paper
Add Code

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems

2 code implementations • 20 Sep 2019 • Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Object detection and tracking are challenging tasks for resource-constrained embedded systems.

Efficient Neural Network Object +3

231

Paper
Code

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

1 code implementation • 25 Jun 2019 • Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

object-detection Object Detection

231

Paper
Code

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2 code implementations • 20 May 2019 • Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

object-detection Object Detection

231

Paper
Code

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

2 code implementations • 9 Apr 2019 • Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

C++ code object-detection +1

231

Paper
Code

SiamVGG: Visual Tracking using Deeper Siamese Networks

4 code implementations • 7 Feb 2019 • Yuhong Li, Xiaofan Zhang, Deming Chen

It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking.

Ranked #1 on Visual Object Tracking on OTB-50

Visual Object Tracking Visual Tracking

434

Paper
Code

When CTC Training Meets Acoustic Landmarks

no code implementations • 5 Nov 2018 • Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.

Automatic Speech Recognition (ASR)

Paper
Add Code

Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA

no code implementations • 31 Jul 2018 • Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, Deming Chen

To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes.

Edge-computing Quantization

Paper
Add Code

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

no code implementations • 15 May 2018 • Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs

no code implementations • 23 Mar 2018 • Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, JinJun Xiong, Deming Chen

Deep Convolutional Neural Networks have become a Swiss knife in solving critical artificial intelligence tasks.

Face Recognition

Paper
Add Code

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

13 code implementations • CVPR 2018 • Yuhong Li, Xiaofan Zhang, Deming Chen

We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance.

Ranked #3 on Crowd Counting on Venice

Crowd Counting Scene Recognition

634

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.