A Modular Robotic System for Autonomous Exploration and Semantic Updating in Large-Scale Indoor Environments​​

Sai Haneesh Allu      Itay Kadosh      Tyler Summers      Yu Xiang

Abstract

We present a modular robotic system for autonomous exploration and semantic updating of large-scale unknown environments. Our approach enables a mobile robot to build, revisit, and update a hybrid semantic map that integrates a 2D occupancy grid for geometry with a topological graph for object semantics. Unlike prior methods that rely on manual teleoperation or precollected datasets, our two-phase approach achieves end-to-end autonomy: first, a modified frontier-based exploration algorithm with dynamic search windows constructs a geometric map; second, using a greedy trajectory planner, environments are revisited, and object semantics are updated using open-vocabulary object detection and segmentation. This modular system, compatible with any metric SLAM frame- work, supports continuous operation by efficiently updating the semantic graph to reflect short-term and long-term changes such as object relocation, removal, or addition. We validate the approach on a Fetch robot in real-world indoor environments of approximately 8,500 sq.m and 117 sq.m, demonstrating robust and scalable semantic mapping and continuous adaptation, marking a fully autonomous integration of exploration, mapping, and semantic updating on a physical robot.



System Overview

Teaser Image

Real World Autonomous Exploration

The video demonstrates a robot autonomously exploring a large scale 96m x 93m area and a medium scale 9m x 13m area using a Dynamic Window Frontier Exploration strategy. In the large scale envrionment, the robot completes the exploration in approximately 150 minutes, reaching a maximum speed of 0.6 m/s. During this process, it covers a total distance of over 800 meters.

Open Vocabulary Detection and Segmentation

Good Detection
✅ Good Detection & Segmentation
Bad Detection
❌ Poor Detection & Segmentation

Object detection and segmentation are performed in real time to build the semantic map. Our system utilizes GroundingDINO to detect objects from the robot’s RGB image observation, providing labels and bounding boxes, which are then used as prompts for MobileSAM, a faster version of SAM, to generate segmentation masks.

Comparison with Khronos

Citation (BibTeX)

Please cite this work if it helps in your research:
@misc{allu2025modularroboticautonomousexploration,
        title={A Modular Robotic System for Autonomous Exploration and Semantic Updating in Large-Scale Indoor Environments}, 
        author={Sai Haneesh Allu and Itay Kadosh and Tyler Summers and Yu Xiang},
        year={2025},
        eprint={2409.15493},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2409.15493}, 
  }

Contact

Send any comments or questions to Sai Haneesh Allu: saihaneesh.allu@utdallas.edu

Acknowledgements

This work was supported by the DARPA Perceptually- enabled Task Guidance (PTG) Program under contract num- ber HR00112220005, the Sony Research Award Program, the National Science Foundation (NSF) under Grant Nos. 2346528 and 2520553, and the NVIDIA Academic Grant Program Award. The work of T. Summers was supported by the United States Air Force Office of Scientific Research under Grant FA9550-23-1-0424 and the National Science Foundation under Grant ECCS-2047040. We would like to thank our colleague, Jishnu Jaykumar P, for his assistance during the experiments