GUIDE: Goal-Initialized Directional Understanding for End-to-End Visual Navigation

Liang Wang1,2,*, Jin Jin3,*, KanZhong Yao2, YiBin Wu3,4, Fangqiang Ding5, Jin Wang3, Jun Wu1, Zhe Sun2,†, Qiuguo Zhu1,†

  1. 1 Institute of Cyber-Systems and Control, Zhejiang University
  2. 2 Institute of Artificial Intelligence (TeleAI), China Telecom
  3. 3 Oxford Robotics Institute, University of Oxford
  4. 4 Center for Robotics, University of Bonn
  5. 5 Department of Mechanical Engineering, Massachusetts Institute of Technology

* Equal contribution. † Corresponding authors.

Learning-based visual navigation for legged robots typically relies on continuous goal updates from hierarchical state estimation to provide a persistent directional reference. This reliance incurs additional sensory and computational overhead and deviates from fully end-to-end mobile autonomy. Furthermore, under partial observability, policies are prone to learn myopic behaviors, easily becoming trapped in dead ends and complex structural layouts.

To address these limitations, we investigate a goal-initialized navigation setting, where the target is provided only once at the beginning of an episode, requiring the robot to operate based on intrinsic spatial memory without subsequent goal updates from external modules. In this work, we propose GUIDE, a fully end-to-end reinforcement learning framework designed to cultivate internal directional awareness.

Specifically, GUIDE incorporates a spatial anchor predictor that leverages multi-frequency proprioceptive history to extract egomotion representations, thereby maintaining a persistent long-horizon spatial context for navigation. Concurrently, it utilizes raw depth streams to perceive local environmental geometry. We evaluate the proposed framework across both simulation and real-world scenarios on a quadruped robot. Experiments show that GUIDE learns reliable egomotion and directional awareness, enabling a fully end-to-end deployed policy to safely navigate through dense clutter and structured mazes without subsequent goal guidance or prior maps.

Overview

Goal-initialized navigation with learned directional awareness.

GUIDE studies end-to-end visual navigation under a goal-initialized setting. The target is provided only once at the beginning of an episode, and the deployed policy must sustain directional awareness using onboard proprioception and depth perception without streaming relative-goal updates.

Goal

The robot receives target information only at initialization and receives no external goal refresh during rollout.

Memory

Multi-frequency proprioceptive history is used to learn egomotion representations and persistent spatial context.

Perception

Raw depth streams provide local geometric cues for obstacle avoidance and navigation through cluttered or maze-like scenes.

Pipeline

End-to-end visual navigation framework.

Overview of the GUIDE framework.
Pipeline of the GUIDE framework. Multi-frequency proprioceptive history is processed into proprioceptive tokens, which are supervised to predict spatial anchor vectors that cultivate egomotion and directional awareness. Concurrently, depth buffers are encoded and fused with these tokens via cross-attention to yield spatial latents. Finally, the actor aggregates these representations with the latest proprioceptive state to produce twist commands. All modules are jointly optimized end-to-end from scratch.

Simulation

Cluttered Environment

 

Maze Environment

 

Real-world

In-lab Evaluations

Maze Goal Pos:(1, 6.5, 0)

 

Maze Goal Pos:(6.5, 6, 0)

 

Clutter Goal Pos:(6.5, 6.5, 0)

 

Clutter Goal Pos:(6.5, 0, 0)

 

12m x 12m Maze

Maze Goal Pos:(13, 0, 0) Wide View

 

Maze Goal Pos:(13, 0, 0) Side View

 

Maze Goal Pos:(11.5, 0, 0) Wide View

 

Maze Goal Pos:(11.5, 0, 0) Side View

 

Maze Goal Pos:(3, 1, 0) Wide View

 

Maze Goal Pos:(3, 1, 0) Side View

 

In-the-wild Deployments

Office Corridors

 

Dynamic Obstacles

 

Dense Vegetation

 

BibTeX

@misc{wang2026guidegoalinitializeddirectionalunderstanding,
  title={GUIDE: Goal-Initialized Directional Understanding for End-to-End Visual Navigation},
  author={Liang Wang and Jin Jin and KanZhong Yao and YiBin Wu and Fangqiang Ding and Jin Wang and Jun Wu and Zhe Sun and Qiuguo Zhu},
  year={2026},
  eprint={2606.10832},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2606.10832},
}