TR2022-149
Learning Occlusion-Aware Dense Correspondences for Multi-Modal Images
-
- "Learning Occlusion-Aware Dense Correspondences for Multi-Modal Images", IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), DOI: 10.1109/AVSS56176.2022.9959354, November 2022, pp. 1-8.BibTeX TR2022-149 PDF
- @inproceedings{Shimoya2022nov,
- author = {Shimoya, Ryosuke and Morimoto, Tahashi and van Baar, Jeroen and Boufounos, Petros T. and Ma, Yanting and Mansour, Hassan},
- title = {Learning Occlusion-Aware Dense Correspondences for Multi-Modal Images},
- booktitle = {IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)},
- year = 2022,
- pages = {1--8},
- month = nov,
- doi = {10.1109/AVSS56176.2022.9959354},
- isbn = {978-1-6654-6382-9},
- url = {https://www.merl.com/publications/TR2022-149}
- }
,
- "Learning Occlusion-Aware Dense Correspondences for Multi-Modal Images", IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), DOI: 10.1109/AVSS56176.2022.9959354, November 2022, pp. 1-8.
-
MERL Contacts:
-
Research Areas:
Artificial Intelligence, Computational Sensing, Computer Vision, Signal Processing
Abstract:
We introduce a scalable multi-modal approach to learn dense, i.e., pixel-level, correspondences and occlusion maps, between images in a video sequence. The problems of finding dense correspondences and occlusion maps are fundamental in computer vision. In this work we jointly train a deep network to tackle both, with a shared feature extraction stage. We use depth and color images with ground truth optical flow and occlusion maps to train the network end-to- end. From the multi-modal input, the network learns to estimate occlusion maps, optical flows, and a correspondence embedding providing a meaningful latent feature space. We evaluate the performance on a dataset of images derived from synthetic characters, and perform a thorough ablation study to demonstrate that the proposed components of our architecture combine to achieve the lowest correspondence error. The scalability of our proposed method comes from the ability to incorporate additional modalities, e.g., infrared images.