Vision-Language Navigation
31 papers with code • 1 benchmarks • 7 datasets
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.
( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )
Most implemented papers
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.
Cross-Lingual Vision-Language Navigation
Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task.
The Regretful Navigation Agent for Vision-and-Language Navigation
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions.
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation
Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.
Active Visual Information Gathering for Vision-Language Navigation
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment
In this paper we propose a new framework - MoViLan (Modular Vision and Language) for execution of visually grounded natural language instructions for day to day indoor household tasks.