TR2024-127
Few-shot Transparent Instance Segmentation for Bin Picking
-
- "Few-shot Transparent Instance Segmentation for Bin Picking", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024, pp. 5009-5016.BibTeX TR2024-127 PDF Video
- @inproceedings{Cherian2024sep,
- author = {Cherian, Anoop and Jain, Siddarth and Marks, Tim K.}},
- title = {Few-shot Transparent Instance Segmentation for Bin Picking},
- booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2024,
- pages = {5009--5016},
- month = sep,
- publisher = {IEEE},
- url = {https://www.merl.com/publications/TR2024-127}
- }
,
- "Few-shot Transparent Instance Segmentation for Bin Picking", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024, pp. 5009-5016.
-
MERL Contacts:
-
Research Areas:
Artificial Intelligence, Computer Vision, Machine Learning, Robotics
Abstract:
In this paper, we consider the problem of segmenting multiple instances of a transparent object from RGB or gray scale camera images in a robotic bin picking setting. Prior methods for solving this task are usually built on the Mask-RCNN framework, but they require large annotated datasets for fine-tuning. Instead, we consider the task in a few- shot setting and present TrInSeg, a data-efficient and robust instance segmentation method for transparent objects based on Mask-RCNN. Our key innovations in TrInSeg are twofold: i) a novel method, dubbed TransMixup, for producing new training images using synthetic transparent object instances created by spatially transforming annotated examples; and ii) a method for scoring the consistency between the predicted segments and rotations of an ideal object template. In our new scoring method, the spatial transformations are produced by an auxiliary neural network, and the scores are then used to filter inconsistent instance predictions. To demonstrate the effectiveness of our method, we present experiments on a new few-shot dataset consisting of seven categories of non-opaque (transparent and translucent) objects, each category varying in the size, shape, and degree of transparency of the objects. Our results show that TrInSeg achieves state-of-the-art performance, improving fine-tuned Mask-RCNN by more than 14% in mIoU, while requiring very few annotated training samples.