Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Anoop
Cherian
Tim K.
Marks
Michael J.
Jones
Chiori
Hori
Suhas
Lohit
Jonathan
Le Roux
Hassan
Mansour
Matthew
Brand
Siddarth
Jain
Devesh K.
Jha
Moitreya
Chatterjee
Radu
Corcodel
Kuan-Chuan
Peng
Diego
Romeres
Pedro
Miraldo
Ye
Wang
Petros T.
Boufounos
Anthony
Vetro
Daniel N.
Nikovski
Gordon
Wichern
Dehong
Liu
William S.
Yerazunis
Toshiaki
Koike-Akino
Arvind
Raghunathan
Avishai
Weiss
Stefano
Di Cairano
François
Germain
Abraham P.
Vinod
Yanting
Ma
Yoshiki
Masuyama
Philip V.
Orlik
Joshua
Rapp
Huifang
Sun
Pu
(Perry)
WangYebin
Wang
Jing
Liu
Naoko
Sawada
Alexander
Schperberg
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
TALK [MERL Seminar Series 2025] David Lindell presents talk titled Imaging Dynamic Scenes from Seconds to Picoseconds Date & Time: Wednesday, January 29, 2025; 1:00 PM
Speaker: David Lindell, University of Toronto
MERL Host: Joshua Rapp
Research Areas: Computational Sensing, Computer Vision, Signal ProcessingAbstractThe observed timescales of the universe span from the exasecond scale (~1e18 seconds) down to the zeptosecond scale (~1e-21 seconds). While specialized imaging systems can capture narrow slices of this temporal spectrum in the ultra-fast regime (e.g., nanoseconds to picoseconds; 1e-9 to 1e-12 s), they cannot simultaneously capture both slow (> 1 second) and ultra-fast events (< 1 nanosecond). Further, ultra-fast imaging systems are conventionally limited to single-viewpoint capture, hindering 3D visualization at ultra-fast timescales. In this talk, I discuss (1) new computational algorithms that turn a single-photon detector into an "ultra-wideband" imaging system that captures events from seconds to picoseconds; and (2) a method for neural rendering using multi-viewpoint, ultra-fast videos captured using single-photon detectors. The latter approach enables rendering videos of propagating light from novel viewpoints, observation of viewpoint-dependent changes in light transport predicted by Einstein, recovery of material properties, and accurate 3D reconstruction from multiply scattered light. Finally, I discuss future directions in ultra-wideband imaging.
-
NEWS MERL Researchers to Present 2 Conference and 11 Workshop Papers at NeurIPS 2024 Date: December 10, 2024 - December 15, 2024
Where: Advances in Neural Processing Systems (NeurIPS)
MERL Contacts: Petros T. Boufounos; Matthew Brand; Ankush Chakrabarty; Anoop Cherian; François Germain; Toshiaki Koike-Akino; Christopher R. Laughman; Jonathan Le Roux; Jing Liu; Suhas Lohit; Tim K. Marks; Yoshiki Masuyama; Kieran Parsons; Kuan-Chuan Peng; Diego Romeres; Pu (Perry) Wang; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Human-Computer Interaction, Information SecurityBrief- MERL researchers will attend and present the following papers at the 2024 Advances in Neural Processing Systems (NeurIPS) Conference and Workshops.
1. "RETR: Multi-View Radar Detection Transformer for Indoor Perception" by Ryoma Yataka (Mitsubishi Electric), Adriano Cardace (Bologna University), Perry Wang (Mitsubishi Electric Research Laboratories), Petros Boufounos (Mitsubishi Electric Research Laboratories), Ryuhei Takahashi (Mitsubishi Electric). Main Conference. https://neurips.cc/virtual/2024/poster/95530
2. "Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads" by Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Joanna Matthiesen (Math Kangaroo USA), Kevin Smith (Massachusetts Institute of Technology), Josh Tenenbaum (Massachusetts Institute of Technology). Main Conference, Datasets and Benchmarks track. https://neurips.cc/virtual/2024/poster/97639
3. "Probabilistic Forecasting for Building Energy Systems: Are Time-Series Foundation Models The Answer?" by Young-Jin Park (Massachusetts Institute of Technology), Jing Liu (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Gordon Wichern (Mitsubishi Electric Research Laboratories), Navid Azizan (Massachusetts Institute of Technology), Christopher R. Laughman (Mitsubishi Electric Research Laboratories), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories). Time Series in the Age of Large Models Workshop.
4. "Forget to Flourish: Leveraging Model-Unlearning on Pretrained Language Models for Privacy Leakage" by Md Rafi Ur Rashid (Penn State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Shagufta Mehnaz (Penn State University), Ye Wang (Mitsubishi Electric Research Laboratories). Workshop on Red Teaming GenAI: What Can We Learn from Adversaries?
5. "Spatially-Aware Losses for Enhanced Neural Acoustic Fields" by Christopher Ick (New York University), Gordon Wichern (Mitsubishi Electric Research Laboratories), Yoshiki Masuyama (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Jonathan Le Roux (Mitsubishi Electric Research Laboratories). Audio Imagination Workshop.
6. "FV-NeRV: Neural Compression for Free Viewpoint Videos" by Sorachi Kato (Osaka University), Takuya Fujihashi (Osaka University), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Takashi Watanabe (Osaka University). Machine Learning and Compression Workshop.
7. "GPT Sonography: Hand Gesture Decoding from Forearm Ultrasound Images via VLM" by Keshav Bimbraw (Worcester Polytechnic Institute), Ye Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). AIM-FM: Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond Workshop.
8. "Smoothed Embeddings for Robust Language Models" by Hase Ryo (Mitsubishi Electric), Md Rafi Ur Rashid (Penn State University), Ashley Lewis (Ohio State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kieran Parsons (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories). Safe Generative AI Workshop.
9. "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation" by Xiangyu Chen (University of Kansas), Ye Wang (Mitsubishi Electric Research Laboratories), Matthew Brand (Mitsubishi Electric Research Laboratories), Pu Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). Workshop on Adaptive Foundation Models.
10. "Preference-based Multi-Objective Bayesian Optimization with Gradients" by Joshua Hang Sai Ip (University of California Berkeley), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Ali Mesbah (University of California Berkeley), Diego Romeres (Mitsubishi Electric Research Laboratories). Workshop on Bayesian Decision-Making and Uncertainty. Lightning talk spotlight.
11. "TR-BEACON: Shedding Light on Efficient Behavior Discovery in High-Dimensions with Trust-Region-based Bayesian Novelty Search" by Wei-Ting Tang (Ohio State University), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Joel A. Paulson (Ohio State University). Workshop on Bayesian Decision-Making and Uncertainty.
12. "MEL-PETs Joint-Context Attack for the NeurIPS 2024 LLM Privacy Challenge Red Team Track" by Ye Wang (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Special Award for Practical Attack.
13. "MEL-PETs Defense for the NeurIPS 2024 LLM Privacy Challenge Blue Team Track" by Jing Liu (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Won 3rd Place Award.
MERL members also contributed to the organization of the Multimodal Algorithmic Reasoning (MAR) Workshop (https://marworkshop.github.io/neurips24/). Organizers: Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Honglu Zhou (Salesforce Research), Kevin Smith (Massachusetts Institute of Technology), Tim K. Marks (Mitsubishi Electric Research Laboratories), Juan Carlos Niebles (Salesforce AI Research), Petar Veličković (Google DeepMind).
- MERL researchers will attend and present the following papers at the 2024 Advances in Neural Processing Systems (NeurIPS) Conference and Workshops.
See All News & Events for Computer Vision -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Robust Machine Learning -
Video Anomaly Detection -
MERL Shopping Dataset -
Point-Plane SLAM
-
-
Internships
-
CV0079: Internship - Novel View Synthesis of Dynamic Scenes
MERL is looking for a highly motivated intern to work on an original research project in rendering dynamic scenes from novel views. A strong background in 3D computer vision and/or computer graphics is required. Experience with the latest advances in volumetric rendering, such as neural radiance fields (NeRFs) and Gaussian Splatting (GS), is desired. The successful candidate is expected to have published at least one paper in a top-tier computer vision/graphics or machine learning venue, such as CVPR, ECCV, ICCV, SIGGRAPH, 3DV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks like Pytorch. The candidate will collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The position is available for graduate students on a Ph.D. track or those that have recently graduated with a Ph.D. Duration and start date are flexible but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top computer vision/graphics and/or machine learning venues, such as CVPR, ECCV, ICCV, SIGGRAPH, 3DV, ICML, ICLR, NeurIPS or AAAI.
- Experienced in the latest novel-view synthesis approaches such as Neural Radiance Fields (NeRFs) or Gaussian Splatting (GS).
- Proficiency in coding (particularly scripting languages like Python) and familiarity with deep learning frameworks, such as PyTorch or Tensorflow.
-
OR0087: Internship - Human-Robot Collaboration with Shared Autonomy
MERL is looking for a highly motivated and qualified intern to contribute to research in human-robot interaction (HRI). The ideal candidate is a Ph.D. student with expertise in robotic manipulation, perception, deep learning, probabilistic modeling, or reinforcement learning. We have several research topics available, including assistive teleoperation, visual scene reconstruction, safety in HRI, shared autonomy, intent recognition, cooperative manipulation, and robot learning. The selected intern will work closely with MERL researchers to develop and implement novel algorithms, conduct experiments, and present research findings. We publish our research at top-tier conferences. Start date is flexible, and the expected duration of the internship is 3-4 months. Interested candidates are encouraged to apply with their updated CV and list of publications.
Required Specific Experience
- Experience with ROS and deep learning frameworks such as PyTorch are essential.
- Strong programming skills in Python and/or C/C++
- Experience with simulation tools, such as PyBullet, Issac Lab, or MuJoCo.
- Prior experience in human-robot interaction, perception, or robotic manipulation.
-
CV0051: Internship - Visual-LiDAR fused object detection and recognition
MERL is looking for a self-motivated intern to work on visual-LiDAR fused object detection and recognition using computer vision. The relevant topics in the scope include (but not limited to): open-vocabulary visual-LiDAR object detection and recognition, domain adaptation or generalization in visual-LiDAR object detection, data-efficient methods for visual-LiDAR object detection, small object detection with visual-LiDAR input, etc. The candidates with experiences of object recognition in LiDAR are strongly preferred. The ideal candidate would be a PhD student with a strong background in computer vision and machine learning, and the candidate is expected to have published at least one paper in a top-tier computer vision, machine learning, or artificial intelligence venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI. Proficiency in Python programming and familiarity in at least one deep learning framework are necessary. The ideal candidate is required to collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The duration of the internship is ideally to be at least 3 months with a flexible start date.
Required Specific Experience
- Experience with Python, PyTorch, and datasets with both images and LiDAR (e.g. the nuScenes dataset).
See All Internships for Computer Vision -
-
Openings
-
CV0124: Postdoctoral Research Fellow - 3D Computer Vision
-
CI0130: Postdoctoral Research Fellow - Artificial General Intelligence (AGI)
See All Openings at MERL -
-
Recent Publications
- "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-034 PDF
- @inproceedings{Hori2025mar,
- author = {Hori, Chiori and Kambara, Motonari and Sugiura, Komei and Ota, Kei and Khurana, Sameer and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and {Le Roux}, Jonathan},
- title = {{Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-034}
- }
, - "Towards Zero-shot 3D Anomaly Localization", IEEE Winter Conference on Applications of Computer Vision (WACV), February 2025.BibTeX TR2025-020 PDF Presentation
- @inproceedings{Wang2025feb2,
- author = {Wang, Yizhou and Peng, Kuan-Chuan and Fu, Raymond},
- title = {{Towards Zero-shot 3D Anomaly Localization}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2025,
- month = feb,
- url = {https://www.merl.com/publications/TR2025-020}
- }
, - "Supplementary Material – Towards Zero-shot 3D Anomaly Localization", IEEE Winter Conference on Applications of Computer Vision (WACV), February 2025.BibTeX TR2025-019 PDF
- @inproceedings{Wang2025feb,
- author = {Wang, Yizhou and Peng, Kuan-Chuan and Fu, Raymond},
- title = {{Supplementary Material – Towards Zero-shot 3D Anomaly Localization}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2025,
- month = feb,
- url = {https://www.merl.com/publications/TR2025-019}
- }
, - "ComplexVAD: Detecting Interaction Anomalies in Video", IEEE Winter Conference on Applications of Computer Vision (WACV) Workshop, February 2025.BibTeX TR2025-016 PDF
- @inproceedings{Mumcu2025feb,
- author = {Mumcu, Furkan and Jones, Michael J. and Yilmaz, Yasin and Cherian, Anoop},
- title = {{ComplexVAD: Detecting Interaction Anomalies in Video}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV) Workshop},
- year = 2025,
- month = feb,
- url = {https://www.merl.com/publications/TR2025-016}
- }
, - "Rotation-Equivariant Neural Networks for Cloud Removal from Satellite Images", Asilomar Conference on Signals, Systems, and Computers (ACSSC), January 2025.BibTeX TR2025-009 PDF
- @inproceedings{Lohit2025jan,
- author = {Lohit, Suhas and Marks, Tim K.},
- title = {{Rotation-Equivariant Neural Networks for Cloud Removal from Satellite Images}},
- booktitle = {Asilomar Conference on Signals, Systems, and Computers (ACSSC)},
- year = 2025,
- month = jan,
- url = {https://www.merl.com/publications/TR2025-009}
- }
, - "SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera", IEEE Winter Conference on Applications of Computer Vision (WACV), December 2024.BibTeX TR2025-003 PDF
- @inproceedings{He2024dec2,
- author = {He, Yuhang and Shin, Sangyun and Cherian, Anoop and Trigoni, Niki and Markham, Andrew},
- title = {{SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = dec,
- url = {https://www.merl.com/publications/TR2025-003}
- }
, - "Temporally Grounding Instructional Diagrams in Unconstrained Videos", IEEE Winter Conference on Applications of Computer Vision (WACV), December 2024.BibTeX TR2025-002 PDF
- @inproceedings{Zhang2024dec,
- author = {Zhang, Jiahao and Zhang, Frederic and Rodriguez, Cristian and Ben-Shabat, Itzik and Cherian, Anoop and Gould, Stephen},
- title = {{Temporally Grounding Instructional Diagrams in Unconstrained Videos}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = dec,
- url = {https://www.merl.com/publications/TR2025-002}
- }
, - "Evaluating Large Vision-and-Language Models on Children’s Mathematical Olympiads", Advances in Neural Information Processing Systems (NeurIPS), November 2024.BibTeX TR2024-160 PDF Video Presentation
- @inproceedings{Cherian2024nov,
- author = {Cherian, Anoop and Peng, Kuan-Chuan and Lohit, Suhas and Matthiesen, Joanna and Smith, Kevin and Tenenbaum, Joshua B.},
- title = {{Evaluating Large Vision-and-Language Models on Children’s Mathematical Olympiads}},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
- year = 2024,
- month = nov,
- url = {https://www.merl.com/publications/TR2024-160}
- }
,
- "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.
-
Videos
-
Software & Data Downloads
-
ComplexVAD Dataset -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
BAyesian Network for adaptive SAmple Consensus -
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
-
Explainable Video Anomaly Localization -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
MotionNet -
Street Scene Dataset -
FoldingNet++ -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
MERL Shopping Dataset -
Joint Geodesic Upsampling -
Plane Extraction using Agglomerative Clustering
-