Mean Shift Mask Transformer for Unseen Object Instance Segmentation

Yangxiao Lu      Yuqiao Chen      Nicholas Ruozzi      Yu Xiang
The University of Texas at Dallas     

IEEE International Conference on Robotics and Automation (ICRA), 2024

Abstract

Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for object segmentation tasks. However, the traditional mean shift clustering algorithm is not easily integrated into an end-to-end neural network training pipeline, which causes representation learning and the clustering algorithm separated. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to unseen object instance segmentation. Our experiments show that MSMFormer improves over the mean shift clustering baseline that uses deep feature representations, and achieves competitive performance compared to the state-of-the-art methods on unseen object instance segmentation.

Appendix

MSMFormer Appendix

The Appendix of Mean Shift Mask Transformer.

Code

MSMFormer

The code for Mean Shift Mask Transformer.

BibTeX

Please cite MSMFormer if it helps your research:
@article{lu2022mean,
      title={Mean Shift Mask Transformer for Unseen Object Instance Segmentation},
      author={Lu, Yangxiao and Chen, Yuqiao and Ruozzi, Nicholas and Xiang, Yu},
      journal={arXiv preprint arXiv:2211.11679},
      year={2022}
    }
}

Contact

Send any comments or questions to Yangxiao Lu: yangxiao.lu@utdallas.edu

Acknowledgements

This work was supported in part by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005.