TR2025-027
Multi-View Radar Detection Transformer with Differentiable Positional Encoding
-
- "Multi-View Radar Detection Transformer with Differentiable Positional Encoding", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-027 PDF
- @inproceedings{Yataka2025mar,
- author = {Yataka, Ryoma and Wang, Pu and Boufounos, Petros T. and Takahashi, Ryuhei},
- title = {{Multi-View Radar Detection Transformer with Differentiable Positional Encoding}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-027}
- }
,
- "Multi-View Radar Detection Transformer with Differentiable Positional Encoding", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.
-
MERL Contacts:
-
Research Areas:
Abstract:
The Radar dEtection TRansformer (RETR) has recently been introduced to fuse multi-view millimeter-wave radar heatmaps by leveraging the detection transformer architecture and a geometric learning framework for indoor radar perception. A notable feature of RETR is its tunable positional encoding (TPE), which allows for adjusting the significance of depth positional embedding across multiple views to promote depth- prioritized feature association. However, the TPE ratio is pre- determined, rather than being optimized during the training process. In this paper, we propose a differentiable positional encoding (DiPE) scheme for RETR by automatically adjusting the TPE ratio during the training for enhanced performance and avoiding exhaustive grid value search. DiPE can be applied along with either pre-fixed (e.g., sinusoidal) or learnable positional embeddings, achieved by multiplying dual differentiable masks over the depth and angular positional embedding vectors. Comprehensive evaluations on the open MMVR dataset demonstrate that the proposed DiPE not only simplifies the determination of the TPE ratio but also enhances the overall detection performance.