SceneReplica: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes

* Equal Contribution       The University of Texas at Dallas      

IEEE International Conference on Robotics and Automation (ICRA), 2024

Abstract

We present a new reproducible benchmark for evaluating robot manipulation in the real world, specifically focusing on the task of pick-and-place. Our benchmark uses the YCB objects, a commonly used dataset in the robotics community, to ensure that our results are comparable to other studies. Additionally, the benchmark is designed to be easily reproducible in the real world, making it accessible for researchers and practitioners. We also provide our experimental results and analyses for model-based and model-free 6D robotic grasping on the benchmark, where representative algorithms for object perception, grasping planning and motion planning are evaluated. We believe that our benchmark will be a valuable tool for advancing the field of robot manipulation. By providing a standardized evaluation framework, researchers can more easily compare different techniques and algorithms, leading to faster progress in developing robot manipulation methods.

Scenes

20 scenes in our SceneReplica benchmark with 5 YCB objects in each scene

20 scenes in our SceneReplica benchmark with 5 YCB objects in each scene

Scene Replication

The process of replicating a scene in the real world. The reference scene image is overlaid to the real camera image to guide how to place objects into the real-world scene.

Grasping

medal LeaderBoard

Adding your results: There are two ways to add your results to the leaderboard. (1) Use the 20 scenes to run grasping experiments and then provide us videos of these experiments to verify your results. (2) Provide source code of your method and we will run grasping experiments for you. Please contact the authors if you are interested in.
# Perception Grasp Planning Motion Planning Control Ordering Grasping Type Pick & Place Success medal Grasping Success Videos
3GDRNPP [9]GraspIt! [2] + Top-DownOMPL [3]MoveItNear-to-FarModel-Based66/10069/100
3GDRNPP [9]GraspIt! [2] + Top-DownOMPL [3]MoveItFixed RandomModel-Based62/10064/100
7MSMFormer [8]Contact-graspnet [7] + Top-DownOMPL [3]MoveItFixed RandomModel-Free61/10070/100
5UCN [5]Contact-graspnet [7] + Top-DownOMPL [3]MoveItNear-to-FarModel-Free60/10063/100
5UCN [5]Contact-graspnet [7] + Top-DownOMPL [3]MoveItFixed RandomModel-Free60/10064/100
1PoseRBPF [1]GraspIt! [2] + Top-DownOMPL [3]MoveItFixed RandomModel-Based59/10059/100
1PoseRBPF [1]GraspIt! [2] + Top-DownOMPL [3]MoveItNear-to-FarModel-Based58/10064/100
7MSMFormer [8]Contact-graspnet [7] + Top-DownOMPL [3]MoveItNear-to-FarModel-Free57/10065/100
8MSMFormer [8]Top-DownOMPL [3]MoveItFixed RandomModel-Free56/10059/100
2PoseCNN [4]GraspIt! [2] + Top-DownOMPL [3]MoveItNear-to-FarModel-Based48/10050/100
4UCN [5]GraspNet [6] + Top-DownOMPL [3]MoveItNear-to-FarModel-Free43/10046/100
9DexNet 2.0 [10]DexNet 2.0 [10]OMPL [3]MoveItAlgorithmicModel-Free43/10051/100
2PoseCNN [4]GraspIt! [2] + Top-DownOMPL [3]MoveItFixed RandomModel-Based38/10044/100
6MSMFormer [8]GraspNet [6] + Top-DownOMPL [3]MoveItNear-to-FarModel-Free38/10041/100
4UCN [5]GraspNet [6] + Top-DownOMPL [3]MoveItFixed RandomModel-Free37/10040/100
6MSMFormer [8]GraspNet [6] + Top-DownOMPL [3]MoveItFixed RandomModel-Free36/10041/100

References

Official Code: Source code from the authors of the method
SceneReplica Version: Our maintained version (upgrade dependencies, add ROS interface, etc.)
  1. X. Deng, A. Mousavian, Y. Xiang, F. Xia, T. Bretl, and D. Fox. Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking. IEEE Transactions on Robotics, 37(5):1328–1342, 2021. [ Official Code | SceneReplica Version ]
  2. A. T. Miller and P. K. Allen. Graspit! a versatile simulator for robotic grasping. IEEE Robotics & Automation Magazine, 11(4):110–122, 2004. [ Official Code ]
  3. I. A. Sucan, M. Moll, and L. E. Kavraki. The open motion planning library. IEEE Robotics & Automation Magazine, 19(4):72–82, 2012. [ Official Code ]
  4. Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017. [ Official Code | SceneReplica Version ]
  5. Y. Xiang, C. Xie, A. Mousavian, and D. Fox. Learning rgb-d feature embeddings for unseen object instance segmentation. In Conference on Robot Learning, pages 461–470. PMLR, 2021. [ Official Code | SceneReplica Version ]
  6. A. Mousavian, C. Eppner, and D. Fox. 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2901–2910, 2019. [ Official Code | SceneReplica Version ]
  7. M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox. Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13438–13444. IEEE, 2021. [ Official Code | SceneReplica Version ]
  8. Y. Lu, Y. Chen, N. Ruozzi, and Y. Xiang. Mean shift mask transformer for unseen object instance segmentation. arXiv preprint arXiv:2211.11679, 2022. [ Official Code ]
  9. G. Wang, F. Manhardt, F. Tombari, and X. Ji, GDR-Net: Geometry-guided direct regression network for monocular 6d object pose estimation, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 16 611–16 62 [ Official Code | SceneReplica Version ]
  10. J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics”, arXiv preprint arXiv:1703.09312, 2017. [ Official Code | SceneReplica Version ]

BibTeX

Please cite SceneReplica if it helps your research:
@article{khargonkar2023scenereplica,
  title={SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes}, 
  author={Ninad Khargonkar and Sai Haneesh Allu and Yangxiao Lu and Jishnu Jaykumar P and Balakrishnan Prabhakaran and Yu Xiang},
  journal={arXiv preprint arXiv:2306.15620},
  year={2023}}

Contact

Send any comments or questions to Ninad | Sai:
ninadarun.khargonkar@utdallas.edu | saihaneesh.allu@utdallas.edu

Acknowledgements

This work was supported in part by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005.