Scene Understanding
514 papers with code • 3 benchmarks • 43 datasets
Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.
Benchmarks
These leaderboards are used to track progress in Scene Understanding
Libraries
Use these libraries to find Scene Understanding models and implementationsDatasets
Subtasks
Most implemented papers
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
We show that SegNet provides good performance with competitive inference time and more efficient inference memory-wise as compared to other architectures.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.
Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding
Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making.
Unified Perceptual Parsing for Scene Understanding
In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.
LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation
As a result they are huge in terms of parameters and number of operations; hence slow too.
Digging Into Self-Supervised Monocular Depth Estimation
Per-pixel ground-truth depth data is challenging to acquire at scale.
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation
A comprehensive set of experiments on the publicly available Cityscapes dataset demonstrates that our system achieves an accuracy that is similar to the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision.
Spatial As Deep: Spatial CNN for Traffic Scene Understanding
Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data
Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications.