Visual Question Answering
662 papers with code • 18 benchmarks • 19 datasets
Libraries
Use these libraries to find Visual Question Answering models and implementationsDatasets
Most implemented papers
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
For captioning and VQA, we show that even non-attention based models can localize inputs.
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.
VQA: Visual Question Answering
Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
A simple neural network module for relational reasoning
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn.
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
This paper presents a new baseline for visual question answering task.
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Dynamic Memory Networks for Visual and Textual Question Answering
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations.
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge.