3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

liguohao96/wsdf 25 Apr 2024

Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics.

2
25 Apr 2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

zzxslp/som-llava 25 Apr 2024

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

7
25 Apr 2024

Multimodal Information Interaction for Medical Image Segmentation

fxxjuses/micformer 25 Apr 2024

To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality.

2
25 Apr 2024

Lost in Recursion: Mining Rich Event Semantics in Knowledge Graphs

fploetzky/websci2024 25 Apr 2024

In this paper, we show how narratives concerning complex events can be constructed and utilized.

0
25 Apr 2024

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

hailanyi/cpd 25 Apr 2024

The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes.

5
25 Apr 2024

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

lihua-jing/pad 25 Apr 2024

Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility.

1
25 Apr 2024

A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation

emi-group/evoxbench 25 Apr 2024

To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs.

71
25 Apr 2024

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

JackAILab/ConsistentID 25 Apr 2024

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

0
25 Apr 2024

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

opendilab/LightZero 25 Apr 2024

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains.

872
25 Apr 2024

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

aimmemotion/emovit 25 Apr 2024

Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions.

2
25 Apr 2024