SceneReplica: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes

Abstract

We present a new reproducible benchmark for evaluating robot manipulation in the real world, specifically focusing on the task of pick-and-place. Our benchmark uses the YCB objects, a commonly used dataset in the robotics community, to ensure that our results are comparable to other studies. Additionally, the benchmark is designed to be easily reproducible in the real world, making it accessible for researchers and practitioners. We also provide our experimental results and analyses for model-based and model-free 6D robotic grasping on the benchmark, where representative algorithms for object perception, grasping planning and motion planning are evaluated. We believe that our benchmark will be a valuable tool for advancing the field of robot manipulation. By providing a standardized evaluation framework, researchers can more easily compare different techniques and algorithms, leading to faster progress in developing robot manipulation methods.

Scenes

20 scenes in our SceneReplica benchmark with 5 YCB objects in each scene

Scene Replication

The process of replicating a scene in the real world. The reference scene image is overlaid to the real camera image to guide how to place objects into the real-world scene.

LeaderBoard

Adding your results: There are two ways to add your results to the leaderboard. (1) Use the 20 scenes to run grasping experiments and then provide us videos of these experiments to verify your results. (2) Provide source code of your method and we will run grasping experiments for you. Please contact the authors if you are interested in.

#	Perception	Grasp Planning	Motion Planning	Control	Ordering	Grasping Type	Pick & Place Success	Grasping Success
11	GDRNPP [9]	RFP [12] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Based	70/100	73/100
3	GDRNPP [9]	GraspIt! [2] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Based	66/100	69/100
10	MSMFormer [8]	Contact-graspnet [7] + Top-Down	GTO [11]	MoveIt	Near-to-Far	Model-Free	65/100	71/100
3	GDRNPP [9]	GraspIt! [2] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Based	62/100	64/100
7	MSMFormer [8]	Contact-graspnet [7] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Free	61/100	70/100
5	UCN [5]	Contact-graspnet [7] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Free	60/100	63/100
5	UCN [5]	Contact-graspnet [7] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Free	60/100	64/100
1	PoseRBPF [1]	GraspIt! [2] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Based	59/100	59/100
1	PoseRBPF [1]	GraspIt! [2] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Based	58/100	64/100
7	MSMFormer [8]	Contact-graspnet [7] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Free	57/100	65/100
8	MSMFormer [8]	Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Free	56/100	59/100
2	PoseCNN [4]	GraspIt! [2] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Based	48/100	50/100
4	UCN [5]	GraspNet [6] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Free	43/100	46/100
9	DexNet 2.0 [10]	DexNet 2.0 [10]	OMPL [3]	MoveIt	Algorithmic	Model-Free	43/100	51/100
2	PoseCNN [4]	GraspIt! [2] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Based	38/100	44/100
6	MSMFormer [8]	GraspNet [6] + Top-Down	OMPL [3]	MoveIt	Near-to-Far	Model-Free	38/100	41/100
4	UCN [5]	GraspNet [6] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Free	37/100	40/100
6	MSMFormer [8]	GraspNet [6] + Top-Down	OMPL [3]	MoveIt	Fixed Random	Model-Free	36/100	41/100

References

Official Code: Source code from the authors of the method
SceneReplica Version: Our maintained version (upgrade dependencies, add ROS interface, etc.)

X. Deng, A. Mousavian, Y. Xiang, F. Xia, T. Bretl, and D. Fox. Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking. IEEE Transactions on Robotics, 37(5):1328–1342, 2021. [ Official Code | SceneReplica Version ]
A. T. Miller and P. K. Allen. Graspit! a versatile simulator for robotic grasping. IEEE Robotics & Automation Magazine, 11(4):110–122, 2004. [ Official Code ]
I. A. Sucan, M. Moll, and L. E. Kavraki. The open motion planning library. IEEE Robotics & Automation Magazine, 19(4):72–82, 2012. [ Official Code ]
Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017. [ Official Code | SceneReplica Version ]
Y. Xiang, C. Xie, A. Mousavian, and D. Fox. Learning rgb-d feature embeddings for unseen object instance segmentation. In Conference on Robot Learning, pages 461–470. PMLR, 2021. [ Official Code | SceneReplica Version ]
A. Mousavian, C. Eppner, and D. Fox. 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2901–2910, 2019. [ Official Code | SceneReplica Version ]
M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox. Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13438–13444. IEEE, 2021. [ Official Code | SceneReplica Version ]
Y. Lu, Y. Chen, N. Ruozzi, and Y. Xiang. Mean shift mask transformer for unseen object instance segmentation. arXiv preprint arXiv:2211.11679, 2022. [ Official Code ]
G. Wang, F. Manhardt, F. Tombari, and X. Ji, GDR-Net: Geometry-guided direct regression network for monocular 6d object pose estimation, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 16 611–16 62 [ Official Code | SceneReplica Version ]
J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics”, arXiv preprint arXiv:1703.09312, 2017. [ Official Code | SceneReplica Version ]
Y. Xiang, S.H. Allu, R. Peddi, T. Summers, V. Gogate, “Grasping Trajectory Optimization with Point Clouds”, arXiv preprint arXiv:2403.05466, 2024. [ Official Code | SceneReplica Version ]
N. Khargonkar, L.F. Casas, B. Prabhakaran, Y. Xiang, “RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis”, arXiv preprint arXiv:2409.14519 2024. [ Official Code | SceneReplica Version ]

BibTeX

Please cite SceneReplica if it helps your research:

@article@inproceedings{khargonkar2024scenereplica,
  title={SceneReplica: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes},
  author={Khargonkar, Ninad and Allu, Sai Haneesh and Lu, Yangxiao and Prabhakaran, Balakrishnan and Xiang, Yu and others},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={8258--8264},
  year={2024},
  organization={IEEE}}

Contact

Send any comments or questions to Ninad | Sai:
ninadarun.khargonkar@utdallas.edu | saihaneesh.allu@utdallas.edu

Acknowledgements

This work was supported in part by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005.