iTeach: Interactive Teaching for Robot Perception using Mixed Reality

Jishnu Jaykumar P

Cole Salvato

Vinaya Bomnale

Jikai Wang

Yu Xiang

The University of Texas at Dallas

arXiv Supplementary Data Code
Exp-Setup Real World Demo 🤗 HuggingFace Demo

📝 Abstract

We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to door and handle detection for household tasks, iTeach integrates a HoloLens app with an interactive YOLO model. Furthermore, we introduce the IRVLUTD DoorHandle dataset. DH-YOLO, our efficient detection model, significantly enhances the accuracy and efficiency of door and handle detection, highlighting the potential of MR to make robotic systems more capable and adaptive in real-world environments.

🔍 Overview

🤖 iTeach system overview. (i) A Robot views a scene. 🌄 (ii) A model (here DH-YOLO) predicts predefined objects (here door/handle) 🚪🖐️. (iii) Scene image with predictions overlayed (SIP) are sent to HoloLens POV 🥽. (iv) User wearing HoloLens sees the same scene image (SIP) and annotates the correct labels ✍️. (v) The annotated labels with scene images are sent to a trainable model for fine-tuning ⚙️. (vi) Goal🎯: The predictions improve based on interactive teaching.

🚀 System Workflow

🛠️ Detailed workflow of the system. ✅ Correct predictions are outlined in green, and ❌ incorrect predictions are marked in red borders.

🌍 Real World Demo

🎥 Demo video showing human interaction and learning

🔬 Our Experiment Setup 🛠️

🤖 Our experimental setup included a Fetch mobile robot, a HoloLens 2 device 🥽, a Lenovo Legion Pro 7 laptop 💻 equipped with an RTX 4090 GPU, and a human instructor 👤. The laptop environment is configured with Python 3.8.10 and PyTorch 2.4.0, utilizing CUDA 12.1 and an NVIDIA GeForce RTX 4090 Laptop GPU, which has approximately 16 GB of memory.

🥽 HoloLens 2 App

(i) The user interface flow of our labelling app from the startup phase to the main menu.

(ii) Description of the main menu options in the iTeachlabeller app.

✨ We are providing a build here to help you get started easily 🚀, without the need to set up the environment or go through the building process again 🔧.

🌟 Sample Scenarios for Data Collection

(i) Diverse samples from different locations.

(ii) Different samples from the same location.

⚠️ Practical Challenges Encountered During Experiments

⚙️ Each of these challenges comes with corresponding benefits that enhance the overall use case of the system. In summary, these challenges are manageable, as demonstrated by our fully functional system ✅.

🆕 We introduce a new dataset collected from the UT Dallas campus consisting of both indoor 🏢 and outdoor 🌳 scenes.

IRVLUTD-DoorHandle-Dataset 🔥 iTeach-Experiment-Datasets

Once you successfully download and extract the dataset, you should have a folder with the following structure:

├── IRVLUTD-DoorHandle-Dataset/
                    ├── train/
                          ├── train.524.zip
                          ├── train.1008.zip
                          ├── train.1532.zip
                    ├── test_dataset.256.zip
                    ├── hololens.finetuning.data.100.zip
                    └── README.md

More information can be found in the README.md. For instructions about using the dataset please see the PyTorch Dataloader.

Note: Use iTeach-experiment-datasets for experiments mentioned in the paper.

Code

iTeach

The code for (i) iTeachLabeller app, (ii) iTeach system, and (iii) iTeach experiments. 🔧

iTeach Toolkit

A toolkit for the iTeach containing inference code for DH-YOLO.

🤗 HuggingFace Demo

The code for 🤗 space.

iTeach PyTorch Dataloader

A PyTorch dataloader for the IRVLUTD-DoorHandle-Dataset.

BibTeX

Please cite iTeach if it helps your research 🙌:

@misc{padalunkal2024iteach,
    title={{iTeach: Interactive Teaching for Robot Perception using Mixed Reality}},
    author={Jishnu Jaykumar P and Cole Salvato and Vinaya Bomnale and Jikai Wang and Yu Xiang},
    year={2024},
    eprint={2410.09072},
    archivePrefix={arXiv},
    primaryClass={cs.RO}
}

Contact

issue

discussion forum

jishnu.p@utdallas.edu

🙏 Acknowledgements

This work was supported by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005, the Sony Research Award Program, and the National Science Foundation (NSF) under Grant No.2346528. We thank Sai Haneesh Allu for his assistance with the real-world experiments. 🙌