We present a modular robotic system for autonomous exploration and semantic updating of large-scale unknown environments. Our approach enables a mobile robot to build, revisit, and update a hybrid semantic map that integrates a 2D occupancy grid for geometry with a topological graph for object semantics. Unlike prior methods that rely on manual teleoperation or precollected datasets, our two-phase approach achieves end-to-end autonomy: first, a modified frontier-based exploration algorithm with dynamic search windows constructs a geometric map; second, using a greedy trajectory planner, environments are revisited, and object semantics are updated using open-vocabulary object detection and segmentation. This modular system, compatible with any metric SLAM frame- work, supports continuous operation by efficiently updating the semantic graph to reflect short-term and long-term changes such as object relocation, removal, or addition. We validate the approach on a Fetch robot in real-world indoor environments of approximately 8,500 sq.m and 117 sq.m, demonstrating robust and scalable semantic mapping and continuous adaptation, marking a fully autonomous integration of exploration, mapping, and semantic updating on a physical robot.
The video demonstrates a robot autonomously exploring a large scale 96m x 93m area and a medium scale 9m x 13m area using a Dynamic Window Frontier Exploration strategy. In the large scale envrionment, the robot completes the exploration in approximately 150 minutes, reaching a maximum speed of 0.6 m/s. During this process, it covers a total distance of over 800 meters.
Object detection and segmentation are performed in real time to build the semantic map. Our system utilizes GroundingDINO to detect objects from the robot’s RGB image observation, providing labels and bounding boxes, which are then used as prompts for MobileSAM, a faster version of SAM, to generate segmentation masks.
@misc{allu2025modularroboticautonomousexploration,
title={A Modular Robotic System for Autonomous Exploration and Semantic Updating in Large-Scale Indoor Environments},
author={Sai Haneesh Allu and Itay Kadosh and Tyler Summers and Yu Xiang},
year={2025},
eprint={2409.15493},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2409.15493},
}
Send any comments or questions to Sai Haneesh Allu: saihaneesh.allu@utdallas.edu
This work was supported by the DARPA Perceptually- enabled Task Guidance (PTG) Program under contract num- ber HR00112220005, the Sony Research Award Program, the National Science Foundation (NSF) under Grant Nos. 2346528 and 2520553, and the NVIDIA Academic Grant Program Award. The work of T. Summers was supported by the United States Air Force Office of Scientific Research under Grant FA9550-23-1-0424 and the National Science Foundation under Grant ECCS-2047040. We would like to thank our colleague, Jishnu Jaykumar P, for his assistance during the experiments