Search Results for author: Jingbo Zhu

Found 78 papers, 27 papers with code

Prior Constraints-based Reward Model Training for Aligning Large Language Models

1 code implementation1 Apr 2024 Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs. However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model. This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.

reinforcement-learning

RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners

no code implementations19 Mar 2024 Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, Jingbo Zhu

Our experiments across 11 arithmetic and commonsense reasoning tasks show that RankPrompt significantly enhances the reasoning performance of ChatGPT and GPT-4, with improvements of up to 13%.

Large Language Models are Parallel Multilingual Learners

1 code implementation14 Mar 2024 Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang, Jingbo Zhu

In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.

In-Context Learning

Introduction to Transformers: an NLP Perspective

1 code implementation29 Nov 2023 Tong Xiao, Jingbo Zhu

Transformers have dominated empirical machine learning models of natural language processing.

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

1 code implementation7 Nov 2023 Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning.

Multi-Task Learning

Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs

1 code implementation26 Oct 2023 Yuxin Zuo, Bei Li, Chuanhao Lv, Tong Zheng, Tong Xiao, Jingbo Zhu

This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete.

Attribute Multimodal Machine Translation +2

PartialFormer: Modeling Part Instead of Whole

1 code implementation23 Oct 2023 Tong Zheng, Bei Li, Huiwen Bao, Weiqiao Shan, Tong Xiao, Jingbo Zhu

The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead.

Abstractive Text Summarization Machine Translation +1

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

1 code implementation21 Sep 2023 Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task.

speech-recognition Speech Recognition +1

Learning Evaluation Models from Large Language Models for Sequence Generation

no code implementations8 Aug 2023 Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu

Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters.

Machine Translation Style Transfer +1

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

3 code implementations4 Aug 2023 Chenglong Wang, Hang Zhou, Yimin Hu, Yifu Huo, Bei Li, Tongran Liu, Tong Xiao, Jingbo Zhu

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

no code implementations24 Jun 2023 Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo Zhu

While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Recent Advances in Direct Speech-to-text Translation

no code implementations20 Jun 2023 Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu

Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly.

Data Augmentation Knowledge Distillation +2

Understanding Parameter Sharing in Transformers

no code implementations15 Jun 2023 Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu

Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner.

Machine Translation

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

1 code implementation13 Jun 2023 Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST).

MobileNMT: Enabling Translation in 15MB and 30ms

1 code implementation7 Jun 2023 Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu

With the co-design of model and engine, compared with the existing system, we speed up 47. 0x and save 99. 5% of memory with only 11. 6% loss of BLEU.

Model Compression NMT +2

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

no code implementations31 May 2023 Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, Jingbo Zhu

Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts.

Text Generation

CTC-based Non-autoregressive Speech Translation

1 code implementation27 May 2023 Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency.

Translation

Bridging the Granularity Gap for Acoustic Modeling

1 code implementation27 May 2023 Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, Jingbo Zhu

While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights.

speech-recognition Speech Recognition

TranSFormer: Slow-Fast Transformer for Machine Translation

no code implementations26 May 2023 Bei Li, Yi Jing, Xu Tan, Zhen Xing, Tong Xiao, Jingbo Zhu

Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems.

Machine Translation Translation

Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

no code implementations10 May 2023 Ye Lin, Shuhan Zhou, Yanyang Li, Anxiang Ma, Tong Xiao, Jingbo Zhu

For years the model performance in machine learning obeyed a power-law relationship with the model size.

Machine Translation

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

no code implementations1 Feb 2023 Chenglong Wang, Yi Lu, Yongyu Mu, Yimin Hu, Tong Xiao, Jingbo Zhu

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model.

Knowledge Distillation

Prompting Neural Machine Translation with Translation Memories

no code implementations13 Jan 2023 Abudurexiti Reheman, Tao Zhou, Yingfeng Luo, Di Yang, Tong Xiao, Jingbo Zhu

Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community.

Machine Translation NMT +1

EIT: Enhanced Interactive Transformer

1 code implementation20 Dec 2022 Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, Jingbo Zhu

In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms.

Abstractive Text Summarization Language Modelling +2

Learning Multiscale Transformer Models for Sequence Generation

1 code implementation19 Jun 2022 Bei Li, Tong Zheng, Yi Jing, Chengbo Jiao, Tong Xiao, Jingbo Zhu

In this work, we define those scales in different linguistic units, including sub-words, words and phrases.

On Vision Features in Multimodal Machine Translation

2 code implementations ACL 2022 Bei Li, Chuanhao Lv, Zefan Zhou, Tao Zhou, Tong Xiao, Anxiang Ma, Jingbo Zhu

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.

Image Captioning Multimodal Machine Translation +3

The NiuTrans System for the WMT21 Efficiency Task

1 code implementation16 Sep 2021 Chenglong Wang, Chi Hu, Yongyu Mu, Zhongxiang Yan, Siming Wu, Minyi Hu, Hang Cao, Bei Li, Ye Lin, Tong Xiao, Jingbo Zhu

This paper describes the NiuTrans system for the WMT21 translation efficiency task (http://statmt. org/wmt21/efficiency-task. html).

Knowledge Distillation Translation

The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task

no code implementations ACL (IWSLT) 2021 Chen Xu, Xiaoqian Liu, Xiaowen Liu, Laohu Wang, Canan Huang, Tong Xiao, Jingbo Zhu

This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription.

Position Translation

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

no code implementations ACL 2021 Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

An Efficient Transformer Decoder with Compressed Sub-layers

no code implementations3 Jan 2021 Yanyang Li, Ye Lin, Tong Xiao, Jingbo Zhu

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness.

Machine Translation Translation

Learning Light-Weight Translation Models from Deep Transformer

1 code implementation27 Dec 2020 Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu

We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model.

Knowledge Distillation Machine Translation +2

A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

no code implementations COLING 2020 Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, ShuJian Huang, Tong Xiao, Jingbo Zhu

Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13. 64~55. 53% between English and four distant languages, i. e., Chinese, Japanese, Vietnamese and Thai.

Dimensionality Reduction Self-Learning

Layer-Wise Multi-View Learning for Neural Machine Translation

no code implementations COLING 2020 Qiang Wang, Changliang Li, Yue Zhang, Tong Xiao, Jingbo Zhu

In this way, in addition to the topmost encoder layer (referred to as the primary view), we also incorporate an intermediate encoder layer as the auxiliary view.

Machine Translation MULTI-VIEW LEARNING +2

Shallow-to-Deep Training for Neural Machine Translation

1 code implementation EMNLP 2020 Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu

We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly.

Machine Translation NMT +2

Towards Fully 8-bit Integer Inference for the Transformer Model

no code implementations17 Sep 2020 Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu

8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently.

Language Modelling Quantization +1

Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation

1 code implementation ACL 2020 Bei Li, Hui Liu, Ziyang Wang, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li

In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence.

Machine Translation NMT +2

Learning Architectures from an Extended Search Space for Language Modeling

no code implementations ACL 2020 Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell.

Chunking Language Modelling +4

Neural Machine Translation with Joint Representation

1 code implementation16 Feb 2020 Yanyang Li, Qiang Wang, Tong Xiao, Tongran Liu, Jingbo Zhu

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e. g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency.

Machine Translation NMT +1

Shared-Private Bilingual Word Embeddings for Neural Machine Translation

no code implementations ACL 2019 Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu

For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units.

Machine Translation NMT +3

The NiuTrans Machine Translation System for WMT18

no code implementations WS 2018 Qiang Wang, Bei Li, Jiqiang Liu, Bojian Jiang, Zheyang Zhang, Yinqiao Li, Ye Lin, Tong Xiao, Jingbo Zhu

This paper describes the submission of the NiuTrans neural machine translation system for the WMT 2018 Chinese ↔ English news translation tasks.

Machine Translation Translation

A Simple and Effective Approach to Coverage-Aware Neural Machine Translation

no code implementations ACL 2018 Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, Jingbo Zhu

We offer a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT).

Machine Translation NMT +1

Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

no code implementations EMNLP 2017 Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, Jingbo Zhu

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.