πŸ“ Abstract

We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to door and handle detection for household tasks, iTeach integrates a HoloLens app with an interactive YOLO model. Furthermore, we introduce the IRVLUTD DoorHandle dataset. DH-YOLO, our efficient detection model, significantly enhances the accuracy and efficiency of door and handle detection, highlighting the potential of MR to make robotic systems more capable and adaptive in real-world environments.

πŸ” Overview



πŸ€– iTeach system overview. (i) A Robot views a scene. πŸŒ„ (ii) A model (here DH-YOLO) predicts predefined objects (here door/handle) πŸšͺπŸ–οΈ. (iii) Scene image with predictions overlayed (SIP) are sent to HoloLens POV πŸ₯½. (iv) User wearing HoloLens sees the same scene image (SIP) and annotates the correct labels ✍️. (v) The annotated labels with scene images are sent to a trainable model for fine-tuning βš™οΈ. (vi) Goal🎯: The predictions improve based on interactive teaching.

πŸš€ System Workflow




πŸ› οΈ Detailed workflow of the system. βœ… Correct predictions are outlined in green, and ❌ incorrect predictions are marked in red borders.

🌍 Real World Demo

πŸŽ₯ Demo video showing human interaction and learning




πŸ”¬ Our Experiment Setup πŸ› οΈ

πŸ€– Our experimental setup included a Fetch mobile robot, a HoloLens 2 device πŸ₯½, a Lenovo Legion Pro 7 laptop πŸ’» equipped with an RTX 4090 GPU, and a human instructor πŸ‘€. The laptop environment is configured with Python 3.8.10 and PyTorch 2.4.0, utilizing CUDA 12.1 and an NVIDIA GeForce RTX 4090 Laptop GPU, which has approximately 16 GB of memory.

πŸ₯½ HoloLens 2 App

(i) The user interface flow of our labelling app from the startup phase to the main menu.




(ii) Description of the main menu options in the iTeachlabeller app.




✨ We are providing a build here to help you get started easily πŸš€, without the need to set up the environment or go through the building process again πŸ”§.

🌟 Sample Scenarios for Data Collection

(i) Diverse samples from different locations.



(ii) Different samples from the same location.




⚠️ Practical Challenges Encountered During Experiments

βš™οΈ Each of these challenges comes with corresponding benefits that enhance the overall use case of the system. In summary, these challenges are manageable, as demonstrated by our fully functional system βœ….

πŸ†• We introduce a new dataset collected from the UT Dallas campus consisting of both indoor 🏒 and outdoor 🌳 scenes.

IRVLUTD-DoorHandle-Dataset πŸ”₯ iTeach-Experiment-Datasets

Once you successfully download and extract the dataset, you should have a folder with the following structure:

β”œβ”€β”€ IRVLUTD-DoorHandle-Dataset/
                    β”œβ”€β”€ train/
                          β”œβ”€β”€ train.524.zip
                          β”œβ”€β”€ train.1008.zip
                          β”œβ”€β”€ train.1532.zip
                    β”œβ”€β”€ test_dataset.256.zip
                    β”œβ”€β”€ hololens.finetuning.data.100.zip
                    └── README.md

More information can be found in the README.md. For instructions about using the dataset please see the PyTorch Dataloader.

Note: Use iTeach-experiment-datasets for experiments mentioned in the paper.

Code

iTeach

The code for (i) iTeachLabeller app, (ii) iTeach system, and (iii) iTeach experiments. πŸ”§

iTeach Toolkit

A toolkit for the iTeach containing inference code for DH-YOLO.

πŸ€— HuggingFace Demo

The code for πŸ€— space.

BibTeX

Please cite iTeach if it helps your research πŸ™Œ:
@misc{padalunkal2024iteach,
    title={iTeach: Interactive Teaching for Robot Perception using Mixed Reality},
    author={Jishnu Jaykumar P and Cole Salvato and Vinaya Bomnale and Jikai Wang and Yu Xiang},
    year={2024},
    eprint={2410.09072},
    archivePrefix={arXiv},
    primaryClass={cs.RO}
}

Contact

πŸ™ Acknowledgements

This work was supported by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005, the Sony Research Award Program, and the National Science Foundation (NSF) under Grant No.2346528. We thank Sai Haneesh Allu for his assistance with the real-world experiments. πŸ™Œ