We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to door and handle detection for household tasks, iTeach integrates a HoloLens app with an interactive YOLO model. Furthermore, we introduce the IRVLUTD DoorHandle dataset. DH-YOLO, our efficient detection model, significantly enhances the accuracy and efficiency of door and handle detection, highlighting the potential of MR to make robotic systems more capable and adaptive in real-world environments.
π€ iTeach system overview. (i) A Robot views a scene. π (ii) A model (here DH-YOLO) predicts predefined objects (here door/handle) πͺποΈ. (iii) Scene image with predictions overlayed (SIP) are sent to HoloLens POV π₯½. (iv) User wearing HoloLens sees the same scene image (SIP) and annotates the correct labels βοΈ. (v) The annotated labels with scene images are sent to a trainable model for fine-tuning βοΈ. (vi) Goalπ―: The predictions improve based on interactive teaching.
π οΈ Detailed workflow of the system. β Correct predictions are outlined in green, and β incorrect predictions are marked in red borders.
π₯ Demo video showing human interaction and learning
π€ Our experimental setup included a Fetch mobile robot, a HoloLens 2 device π₯½, a Lenovo Legion Pro 7 laptop π» equipped with an RTX 4090 GPU, and a human instructor π€. The laptop environment is configured with Python 3.8.10 and PyTorch 2.4.0, utilizing CUDA 12.1 and an NVIDIA GeForce RTX 4090 Laptop GPU, which has approximately 16 GB of memory.
(i) The user interface flow of our labelling app from the startup phase to the main menu.
(ii) Description of the main menu options in the iTeachlabeller app.
β¨ We are providing a build here to help you get started easily π, without the need to set up the environment or go through the building process again π§.
(i) Diverse samples from different locations.
(ii) Different samples from the same location.
βοΈ Each of these challenges comes with corresponding benefits that enhance the overall use case of the system. In summary, these challenges are manageable, as demonstrated by our fully functional system β .
Once you successfully download and extract the dataset, you should have a folder with the following structure:
βββ IRVLUTD-DoorHandle-Dataset/
βββ train/
βββ train.524.zip
βββ train.1008.zip
βββ train.1532.zip
βββ test_dataset.256.zip
βββ hololens.finetuning.data.100.zip
βββ README.md
More information can be found in the README.md. For instructions about using the dataset please see the PyTorch Dataloader.
Note: Use iTeach-experiment-datasets for experiments mentioned in the paper.
The code for (i) iTeachLabeller app, (ii) iTeach system, and (iii) iTeach experiments. π§
A toolkit for the iTeach containing inference code for DH-YOLO.
The code for π€ space.
A PyTorch dataloader for the IRVLUTD-DoorHandle-Dataset.
@misc{padalunkal2024iteach,
title={iTeach: Interactive Teaching for Robot Perception using Mixed Reality},
author={Jishnu Jaykumar P and Cole Salvato and Vinaya Bomnale and Jikai Wang and Yu Xiang},
year={2024},
eprint={2410.09072},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
This work was supported by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005, the Sony Research Award Program, and the National Science Foundation (NSF) under Grant No.2346528. We thank Sai Haneesh Allu for his assistance with the real-world experiments. π