Latest Research

3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

liguohao96/wsdf • 25 Apr 2024

Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics.

25 Apr 2024

Paper
Code

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

zzxslp/som-llava • 25 Apr 2024

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

25 Apr 2024

Paper
Code

Multimodal Information Interaction for Medical Image Segmentation

fxxjuses/micformer • 25 Apr 2024

To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality.

25 Apr 2024

Paper
Code

Lost in Recursion: Mining Rich Event Semantics in Knowledge Graphs

fploetzky/websci2024 • 25 Apr 2024

In this paper, we show how narratives concerning complex events can be constructed and utilized.

25 Apr 2024

Paper
Code

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

hailanyi/cpd • 25 Apr 2024

The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes.

25 Apr 2024

Paper
Code

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

lihua-jing/pad • 25 Apr 2024

Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility.

25 Apr 2024

Paper
Code

A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation

emi-group/evoxbench • • 25 Apr 2024

To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs.

25 Apr 2024

Paper
Code

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

JackAILab/ConsistentID • 25 Apr 2024

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

25 Apr 2024

Paper
Code

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

opendilab/LightZero • • 25 Apr 2024

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains.

872

25 Apr 2024

Paper
Code

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

aimmemotion/emovit • 25 Apr 2024

Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions.

25 Apr 2024

Paper
Code