NEWS MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release
Date released: July 22, 2020
-
NEWS MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release Date:
July 22, 2020
-
Where:
Tokyo, Japan
-
Description:
Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.
The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups. -
External Link:
-
MERL Contacts:
-
Research Areas:
Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
-
Related Publications
- "Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog", Interspeech, September 2019, pp. 1886-1890.
,BibTeX TR2019-097 PDF- @inproceedings{Hori2019sep,
- author = {Hori, Chiori and Cherian, Anoop and Marks, Tim K. and Hori, Takaaki},
- title = {Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog},
- booktitle = {Interspeech},
- year = 2019,
- pages = {1886--1890},
- month = sep,
- publisher = {ISCA},
- url = {https://www.merl.com/publications/TR2019-097}
- }
- "Audio-Visual Scene-Aware Dialog", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI: 10.1109/CVPR.2019.00774, June 2019, pp. 7550-7559.
,BibTeX TR2019-048 PDF- @inproceedings{Alamri2019jun,
- author = {Alamri, Huda and Cartillier, Vincent and Das, Abhishek and Wang, Jue and Lee, Stefan and Anderson, Peter and Essa, Irfan and Parikh, Devi and Batra, Dhruv and Cherian, Anoop and Marks, Tim K. and Hori, Chiori},
- title = {Audio-Visual Scene-Aware Dialog},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2019,
- pages = {7550--7559},
- month = jun,
- doi = {10.1109/CVPR.2019.00774},
- url = {https://www.merl.com/publications/TR2019-048}
- }
- "End-to-End Audio Visual Scene-Aware Dialog Using Multimodal Attention-Based Video Features", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2019.8682583, May 2019.
,BibTeX TR2019-016 PDF- @inproceedings{Hori2019may2,
- author = {Hori, Chiori and Alamri, Huda and Wang, Jue and Wichern, Gordon and Hori, Takaaki and Cherian, Anoop and Marks, Tim K. and Cartillier, Vincent and Lopes, Raphael and Das, Abhishek and Essa, Irfan and Batra, Dhruv and Parikh, Devi},
- title = {End-to-End Audio Visual Scene-Aware Dialog Using Multimodal Attention-Based Video Features},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2019,
- month = may,
- doi = {10.1109/ICASSP.2019.8682583},
- url = {https://www.merl.com/publications/TR2019-016}
- }
- "Early and Late Integration of Audio Features for Automatic Video Description", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), DOI: 10.1109/ASRU.2017.8268968, December 2017.
,BibTeX TR2017-183 PDF- @inproceedings{Hori2017dec2,
- author = {Hori, Chiori and Hori, Takaaki and Marks, Tim K. and Hershey, John R.},
- title = {Early and Late Integration of Audio Features for Automatic Video Description},
- booktitle = {IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
- year = 2017,
- month = dec,
- doi = {10.1109/ASRU.2017.8268968},
- url = {https://www.merl.com/publications/TR2017-183}
- }
- "Attention-Based Multimodal Fusion for Video Description", IEEE International Conference on Computer Vision (ICCV), DOI: 10.1109/ICCV.2017.450, October 2017.
,BibTeX TR2017-156 PDF- @inproceedings{Hori2017oct,
- author = {Hori, Chiori and Hori, Takaaki and Lee, Teng-Yok and Zhang, Ziming and Harsham, Bret A. and Sumi, Kazuhiko and Marks, Tim K. and Hershey, John R.},
- title = {Attention-Based Multimodal Fusion for Video Description},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2017,
- month = oct,
- doi = {10.1109/ICCV.2017.450},
- url = {https://www.merl.com/publications/TR2017-156}
- }
- "Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog", Interspeech, September 2019, pp. 1886-1890.
-