1 code implementation • 23 Apr 2024 • Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Man Zhou, Jie Zhang
Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation.
1 code implementation • 25 Mar 2024 • Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning.
1 code implementation • 7 Dec 2023 • Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia
Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.
2 code implementations • 21 Sep 2023 • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.
no code implementations • 15 Apr 2023 • Jingyao Li, Pengguang Chen, Shengju Qian, Jiaya Jia
However, existing models easily misidentify input pixels from unseen classes, thus confusing novel classes with semantically-similar ones.
no code implementations • 1 Mar 2023 • Shengju Qian, Huiwen Chang, Yuanzhen Li, Zizhao Zhang, Jiaya Jia, Han Zhang
We propose Stratified Image Transformer(StraIT), a pure non-autoregressive(NAR) generative model that demonstrates superiority in high-quality image synthesis over existing autoregressive(AR) and diffusion models(DMs).
no code implementations • 21 Dec 2022 • Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia
The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.
1 code implementation • 19 Dec 2021 • Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia
Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.
Ranked #5 on Image Super-Resolution on Set5 - 2x upscaling (using extra training data)
no code implementations • NeurIPS 2021 • Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.
4 code implementations • 17 Jan 2020 • Hao Shao, Shengju Qian, Yu Liu
In this way, a heavy temporal model is replaced by a simple interlacing operator.
no code implementations • ICCV 2019 • Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He
Recent studies have shown remarkable success in face manipulation task with the advance of GANs and VAEs paradigms, but the outputs are sometimes limited to low-resolution and lack of diversity.
1 code implementation • ICCV 2019 • Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia
Facial landmark detection, or face alignment, is a fundamental task that has been extensively studied.
Ranked #18 on Face Alignment on WFLW