HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

Jikai Wang1      Qifan Zhang1      Yu-Wei Chao2      Bowen Wen2      Xiaohu Guo1      Yu Xiang1
1University of Texas at Dallas, 2NVIDIA

Abstract

We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and objects in the collected videos, which significantly reduces the required annotation time compared to manual labeling. With this system, we captured a video dataset of humans using objects to perform different tasks, as well as simple pick-and-place and handover of an object from one hand to the other, which can be used as human demonstrations for embodied AI and robot manipulation research. Our data capture setup and annotation framework can be used by the community to reconstruct 3D shapes of objects and human hands and track their poses in videos.

Data Capture Setup (9 RGB-D Cameras + HoloLens, No Mo-cap)

Hardware Image

Object Shape Reconstruction using a Single Azure Camera

Object Shape

Object Shapes

Semi-automatic Annotation Pipeline for Hand-Object Poses

The only human annotation required is to manually prompt two points for each object in the first frame to generate an initial segmentation mask of the object using SAM, and label the name of the object to associate it to an object in our database.

Pipeline Image

Paper & Document


Citing HO-Cap

Please cite HO-Cap if it helps your research:

@misc{wang2024hocap,
  title={HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction}, 
  author={Jikai Wang and Qifan Zhang and Yu-Wei Chao and Bowen Wen and Xiaohu Guo and Yu Xiang},
  year={2024},
  eprint={2406.06843},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Data

HO-Cap is licensed under CC BY 4.0.

The links to purchase the objects are provided in the Object_Info.xlsx file.

We provide two options for downloading the dataset:

  1. Download the data using a Python script as introduced by HO-Cap.
  2. Download the individual zipped data from Box by clicking the download icon at the top-right corner, and extract them to the "./data" folder manually:

Once you successfully download and extract the dataset, you should have a folder with the following structure:

├── calibration
        ├── models
        ├── subject_1
        │   ├── 20231025_165502
        │   ├── ...
        ├── ...
        └── subject_9
            ├── 20231027_123403
            ├── ...

For instructions about using the dataset please see HO-Cap.

Code

GitHub
HO-Cap-Toolkit
A Python package that provides evaluation and visualization tools for the HO-Cap dataset.

Contact

Send any comments or questions to Jikai Wang: jikai.wang@utdallas.edu.

Acknowledgements

This work was supported in part by the DARPA Perceptually enabled Task Guidance (PTG) Program under contract number HR00112220005 and the Sony Research Award Program


Last updated on 01-June-2024 | Template borrowed from DexYCB.