Abstract

Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method.

NIDS-Net

NIDS-Net is a unified framework for Novel Instance Detection and Segmentation (NIDS).

gto

Foreground Feature Averaging (FFA)

FFA [1] is used to generate initial instance embeddings.

gto

Detection Examples

gto

Segmentation Examples of BOP Benchmark

gto

Code

NIDS-Net

The code for NIDS-Net.

All Embeddings, Model Weights and Predictions

Detection Data BOP Segmentation Data

References

  1. K. Kotar, S. Tian, H.-X. Yu, D. Yamins, and J. Wu. Are these the same apple? comparing images based on object intrinsics. Advances in Neural Information Processing Systems, 36, 2024. arXiv

BibTeX

Please cite NIDS-Net if it helps your research:
@misc{lu2024adapting,
title={Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation},
author={Yangxiao Lu and Jishnu Jaykumar P and Yunhui Guo and Nicholas Ruozzi and Yu Xiang},
year={2024},
eprint={2405.17859},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Contact

Send any comments or questions to Yangxiao Lu: yangxiao.lu@utdallas.edu

Acknowledgements

This work was supported in part by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005.