Scene Recognition
64 papers with code • 8 benchmarks • 15 datasets
Benchmarks
These leaderboards are used to track progress in Scene Recognition
Most implemented papers
CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance.
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks.
CNN Features off-the-shelf: an Astounding Baseline for Recognition
We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the \overfeat network which was trained to perform object classification on ILSVRC13.
Bilinear CNNs for Fine-grained Visual Recognition
We then present a systematic analysis of these networks and show that (1) the bilinear features are highly redundant and can be reduced by an order of magnitude in size without significant loss in accuracy, (2) are also effective for other image classification tasks such as texture and scene recognition, and (3) can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture.
Visual Memorability for Robotic Interestingness via Unsupervised Online Learning
In this paper, we explore the problem of interesting scene prediction for mobile robots.
Places205-VGGNet Models for Scene Recognition
We verify the performance of trained Places205-VGGNet models on three datasets: MIT67, SUN397, and Places205.
Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2.
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN
The hallucination task is treated as an auxiliary task, which can be used with any other action related task in a multitask learning setting.
Indoor Scene Recognition in 3D
Moreover, we advocate multi-task learning as a way of improving scene recognition, building on the fact that the scene type is highly correlated with the objects in the scene, and therefore with its semantic segmentation into different object classes.
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc.