Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains

wang-zijie/yn-question-multi-domains 25 Apr 2024

People often answer yes-no questions without explicitly saying yes, no, or similar polar keywords.

0
25 Apr 2024

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

x-plug/mplug-docowl 25 Apr 2024

Charts are important for presenting and explaining complex data relationships.

876
25 Apr 2024

OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search

pinterest/atg-research 25 Apr 2024

In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search.

22
25 Apr 2024

Vision-based robot manipulation of transparent liquid containers in a laboratory setting

danischober/labliquidvision 25 Apr 2024

Laboratory processes involving small volumes of solutions and active ingredients are often performed manually due to challenges in automation, such as high initial costs, semi-structured environments and protocol variability.

10
25 Apr 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

844
25 Apr 2024

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Gahyeonkim09/AAPL 25 Apr 2024

Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes.

0
25 Apr 2024

Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation

hfates/ikr-net 25 Apr 2024

Yet, there is a gap in the literature to provide a well-generalized deep learning-based solution that performs well on images with unknown and highly complex degradations.

0
25 Apr 2024

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

ailab-cvc/seed-bench 25 Apr 2024

We hope that our work can serve as a valuable addition to existing MLLM benchmarks, providing insightful observations and inspiring further research in the area of text-rich visual comprehension with MLLMs.

236
25 Apr 2024

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

saidwivedi/TokenHMR 25 Apr 2024

We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy.

0
25 Apr 2024