Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Anoop
Cherian
Tim K.
Marks
Michael J.
Jones
Chiori
Hori
Suhas
Lohit
Jonathan
Le Roux
Hassan
Mansour
Matthew
Brand
Siddarth
Jain
Devesh K.
Jha
Moitreya
Chatterjee
Radu
Corcodel
Kuan-Chuan
Peng
Diego
Romeres
Pedro
Miraldo
Ye
Wang
Petros T.
Boufounos
Anthony
Vetro
Daniel N.
Nikovski
Gordon
Wichern
Dehong
Liu
William S.
Yerazunis
Toshiaki
Koike-Akino
Arvind
Raghunathan
Avishai
Weiss
Stefano
Di Cairano
François
Germain
Abraham P.
Vinod
Yanting
Ma
Yoshiki
Masuyama
Philip V.
Orlik
Joshua
Rapp
Huifang
Sun
Pu
(Perry)
WangYebin
Wang
Jing
Liu
Naoko
Sawada
Alexander
Schperberg
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
TALK [MERL Seminar Series 2025] David Lindell presents talk titled Imaging Dynamic Scenes from Seconds to Picoseconds Date & Time: Wednesday, January 29, 2025; 1:00 PM
Speaker: David Lindell, University of Toronto
MERL Host: Joshua Rapp
Research Areas: Computational Sensing, Computer Vision, Signal ProcessingAbstractThe observed timescales of the universe span from the exasecond scale (~1e18 seconds) down to the zeptosecond scale (~1e-21 seconds). While specialized imaging systems can capture narrow slices of this temporal spectrum in the ultra-fast regime (e.g., nanoseconds to picoseconds; 1e-9 to 1e-12 s), they cannot simultaneously capture both slow (> 1 second) and ultra-fast events (< 1 nanosecond). Further, ultra-fast imaging systems are conventionally limited to single-viewpoint capture, hindering 3D visualization at ultra-fast timescales. In this talk, I discuss (1) new computational algorithms that turn a single-photon detector into an "ultra-wideband" imaging system that captures events from seconds to picoseconds; and (2) a method for neural rendering using multi-viewpoint, ultra-fast videos captured using single-photon detectors. The latter approach enables rendering videos of propagating light from novel viewpoints, observation of viewpoint-dependent changes in light transport predicted by Einstein, recovery of material properties, and accurate 3D reconstruction from multiply scattered light. Finally, I discuss future directions in ultra-wideband imaging.
-
NEWS MERL Researchers to Present 2 Conference and 11 Workshop Papers at NeurIPS 2024 Date: December 10, 2024 - December 15, 2024
Where: Advances in Neural Processing Systems (NeurIPS)
MERL Contacts: Petros T. Boufounos; Matthew Brand; Ankush Chakrabarty; Anoop Cherian; François Germain; Toshiaki Koike-Akino; Christopher R. Laughman; Jonathan Le Roux; Jing Liu; Suhas Lohit; Tim K. Marks; Yoshiki Masuyama; Kieran Parsons; Kuan-Chuan Peng; Diego Romeres; Pu (Perry) Wang; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Human-Computer Interaction, Information SecurityBrief- MERL researchers will attend and present the following papers at the 2024 Advances in Neural Processing Systems (NeurIPS) Conference and Workshops.
1. "RETR: Multi-View Radar Detection Transformer for Indoor Perception" by Ryoma Yataka (Mitsubishi Electric), Adriano Cardace (Bologna University), Perry Wang (Mitsubishi Electric Research Laboratories), Petros Boufounos (Mitsubishi Electric Research Laboratories), Ryuhei Takahashi (Mitsubishi Electric). Main Conference. https://neurips.cc/virtual/2024/poster/95530
2. "Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads" by Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Joanna Matthiesen (Math Kangaroo USA), Kevin Smith (Massachusetts Institute of Technology), Josh Tenenbaum (Massachusetts Institute of Technology). Main Conference, Datasets and Benchmarks track. https://neurips.cc/virtual/2024/poster/97639
3. "Probabilistic Forecasting for Building Energy Systems: Are Time-Series Foundation Models The Answer?" by Young-Jin Park (Massachusetts Institute of Technology), Jing Liu (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Gordon Wichern (Mitsubishi Electric Research Laboratories), Navid Azizan (Massachusetts Institute of Technology), Christopher R. Laughman (Mitsubishi Electric Research Laboratories), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories). Time Series in the Age of Large Models Workshop.
4. "Forget to Flourish: Leveraging Model-Unlearning on Pretrained Language Models for Privacy Leakage" by Md Rafi Ur Rashid (Penn State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Shagufta Mehnaz (Penn State University), Ye Wang (Mitsubishi Electric Research Laboratories). Workshop on Red Teaming GenAI: What Can We Learn from Adversaries?
5. "Spatially-Aware Losses for Enhanced Neural Acoustic Fields" by Christopher Ick (New York University), Gordon Wichern (Mitsubishi Electric Research Laboratories), Yoshiki Masuyama (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Jonathan Le Roux (Mitsubishi Electric Research Laboratories). Audio Imagination Workshop.
6. "FV-NeRV: Neural Compression for Free Viewpoint Videos" by Sorachi Kato (Osaka University), Takuya Fujihashi (Osaka University), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Takashi Watanabe (Osaka University). Machine Learning and Compression Workshop.
7. "GPT Sonography: Hand Gesture Decoding from Forearm Ultrasound Images via VLM" by Keshav Bimbraw (Worcester Polytechnic Institute), Ye Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). AIM-FM: Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond Workshop.
8. "Smoothed Embeddings for Robust Language Models" by Hase Ryo (Mitsubishi Electric), Md Rafi Ur Rashid (Penn State University), Ashley Lewis (Ohio State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kieran Parsons (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories). Safe Generative AI Workshop.
9. "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation" by Xiangyu Chen (University of Kansas), Ye Wang (Mitsubishi Electric Research Laboratories), Matthew Brand (Mitsubishi Electric Research Laboratories), Pu Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). Workshop on Adaptive Foundation Models.
10. "Preference-based Multi-Objective Bayesian Optimization with Gradients" by Joshua Hang Sai Ip (University of California Berkeley), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Ali Mesbah (University of California Berkeley), Diego Romeres (Mitsubishi Electric Research Laboratories). Workshop on Bayesian Decision-Making and Uncertainty. Lightning talk spotlight.
11. "TR-BEACON: Shedding Light on Efficient Behavior Discovery in High-Dimensions with Trust-Region-based Bayesian Novelty Search" by Wei-Ting Tang (Ohio State University), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Joel A. Paulson (Ohio State University). Workshop on Bayesian Decision-Making and Uncertainty.
12. "MEL-PETs Joint-Context Attack for the NeurIPS 2024 LLM Privacy Challenge Red Team Track" by Ye Wang (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Special Award for Practical Attack.
13. "MEL-PETs Defense for the NeurIPS 2024 LLM Privacy Challenge Blue Team Track" by Jing Liu (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Won 3rd Place Award.
MERL members also contributed to the organization of the Multimodal Algorithmic Reasoning (MAR) Workshop (https://marworkshop.github.io/neurips24/). Organizers: Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Honglu Zhou (Salesforce Research), Kevin Smith (Massachusetts Institute of Technology), Tim K. Marks (Mitsubishi Electric Research Laboratories), Juan Carlos Niebles (Salesforce AI Research), Petar Veličković (Google DeepMind).
- MERL researchers will attend and present the following papers at the 2024 Advances in Neural Processing Systems (NeurIPS) Conference and Workshops.
See All News & Events for Computer Vision -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Robust Machine Learning -
Video Anomaly Detection -
MERL Shopping Dataset -
Point-Plane SLAM
-
-
Internships
-
CV0056: Internship - "Small" Large Generative Models for Vision and Language
MERL is looking for research interns to conduct research into novel architectures for "small" large generative models. We are currently exploring 0.5 - 2 billion parameter language models, text-to-image models and text-to-video models. Interesting research directions include (a) efficient learning for such models that improves the pareto front of current scaling laws for these sizes, (b) enhancing current transformer-based architectures, and (c) new architectural paradigms beyond transformers such as incorporating explicitly temporal designs. Prior experience with machine learning/computer vision/natural language processing research, and proficiency in building and experimenting with machine learning models using a framework like PyTorch are required. Candidates well into their PhD program with publications in top-tier machine learning, natural language processing or computer vision venues, ideally connected to building generative models, are strongly preferred. Candidates are also expected to collaborate with MERL researchers for preparing manuscripts for scientific publications based on the results obtained during the internship. Duration of the internship is 3 months with a flexible start date.
Required Specific Experience
- Research experience with recent vision and text generative models
- Deep understanding of neural network architectures
- Proficiency in machine learning frameworks like PyTorch
-
ST0096: Internship - Multimodal Tracking and Imaging
MERL is seeking a motivated intern to assist in developing hardware and algorithms for multimodal imaging applications. The project involves integration of radar, camera, and depth sensors in a variety of sensing scenarios. The ideal candidate should have experience with FMCW radar and/or depth sensing, and be fluent in Python and scripting methods. Familiarity with optical tracking of humans and experience with hardware prototyping is desired. Good knowledge of computational imaging and/or radar imaging methods is a plus.
Required Specific Experience
- Experience with Python and Python Deep Learning Frameworks.
- Experience with FMCW radar and/or Depth Sensors.
-
OR0088: Internship - Robot Learning
MERL is looking for a highly motivated and qualified PhD student in the areas of machine learning and robotics, to participate in research on advanced algorithms for learning control of robots and other mechanisms. Solid background and hands-on experience with various machine learning algorithms is expected, and in particular with deep learning algorithms for image processing and object detection. Exposure to deep reinforcement learning and/or learning from demonstration is highly desirable. Familiarity with the use of machine learning algorithms for system identification of mechanical systems would be a plus, along with background in other areas of automatic control. Solid experimental skills and hands-on experience in coding in Python, PyTorch, and OpenCV are required for the position. Some experience with ROS2 and familiarity with classical mechanics and computational physics engines would be helpful, but is not required. The position will provide opportunities for exploring fundamental problems in incremental learning in humans and machines, leading to publishable results. The duration of the internship is 3 to 5 months, with a flexible starting date.
Required Specific Experience
- Python, PyTorch, OpenCV
See All Internships for Computer Vision -
-
Openings
-
CI0130: Postdoctoral Research Fellow - Artificial General Intelligence (AGI)
-
CV0124: Postdoctoral Research Fellow - 3D Computer Vision
See All Openings at MERL -
-
Recent Publications
- "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-034 PDF
- @inproceedings{Hori2025mar,
- author = {Hori, Chiori and Kambara, Motonari and Sugiura, Komei and Ota, Kei and Khurana, Sameer and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and {Le Roux}, Jonathan},
- title = {{Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-034}
- }
, - "Towards Zero-shot 3D Anomaly Localization", IEEE Winter Conference on Applications of Computer Vision (WACV), February 2025.BibTeX TR2025-020 PDF Presentation
- @inproceedings{Wang2025feb2,
- author = {Wang, Yizhou and Peng, Kuan-Chuan and Fu, Raymond},
- title = {{Towards Zero-shot 3D Anomaly Localization}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2025,
- month = feb,
- url = {https://www.merl.com/publications/TR2025-020}
- }
, - "Supplementary Material – Towards Zero-shot 3D Anomaly Localization", IEEE Winter Conference on Applications of Computer Vision (WACV), February 2025.BibTeX TR2025-019 PDF
- @inproceedings{Wang2025feb,
- author = {Wang, Yizhou and Peng, Kuan-Chuan and Fu, Raymond},
- title = {{Supplementary Material – Towards Zero-shot 3D Anomaly Localization}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2025,
- month = feb,
- url = {https://www.merl.com/publications/TR2025-019}
- }
, - "ComplexVAD: Detecting Interaction Anomalies in Video", IEEE Winter Conference on Applications of Computer Vision (WACV) Workshop, February 2025.BibTeX TR2025-016 PDF
- @inproceedings{Mumcu2025feb,
- author = {Mumcu, Furkan and Jones, Michael J. and Yilmaz, Yasin and Cherian, Anoop},
- title = {{ComplexVAD: Detecting Interaction Anomalies in Video}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV) Workshop},
- year = 2025,
- month = feb,
- url = {https://www.merl.com/publications/TR2025-016}
- }
, - "Rotation-Equivariant Neural Networks for Cloud Removal from Satellite Images", Asilomar Conference on Signals, Systems, and Computers (ACSSC), January 2025.BibTeX TR2025-009 PDF
- @inproceedings{Lohit2025jan,
- author = {Lohit, Suhas and Marks, Tim K.},
- title = {{Rotation-Equivariant Neural Networks for Cloud Removal from Satellite Images}},
- booktitle = {Asilomar Conference on Signals, Systems, and Computers (ACSSC)},
- year = 2025,
- month = jan,
- url = {https://www.merl.com/publications/TR2025-009}
- }
, - "SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera", IEEE Winter Conference on Applications of Computer Vision (WACV), December 2024.BibTeX TR2025-003 PDF
- @inproceedings{He2024dec2,
- author = {He, Yuhang and Shin, Sangyun and Cherian, Anoop and Trigoni, Niki and Markham, Andrew},
- title = {{SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = dec,
- url = {https://www.merl.com/publications/TR2025-003}
- }
, - "Temporally Grounding Instructional Diagrams in Unconstrained Videos", IEEE Winter Conference on Applications of Computer Vision (WACV), December 2024.BibTeX TR2025-002 PDF
- @inproceedings{Zhang2024dec,
- author = {Zhang, Jiahao and Zhang, Frederic and Rodriguez, Cristian and Ben-Shabat, Itzik and Cherian, Anoop and Gould, Stephen},
- title = {{Temporally Grounding Instructional Diagrams in Unconstrained Videos}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = dec,
- url = {https://www.merl.com/publications/TR2025-002}
- }
, - "Evaluating Large Vision-and-Language Models on Children’s Mathematical Olympiads", Advances in Neural Information Processing Systems (NeurIPS), November 2024.BibTeX TR2024-160 PDF Video Presentation
- @inproceedings{Cherian2024nov,
- author = {Cherian, Anoop and Peng, Kuan-Chuan and Lohit, Suhas and Matthiesen, Joanna and Smith, Kevin and Tenenbaum, Joshua B.},
- title = {{Evaluating Large Vision-and-Language Models on Children’s Mathematical Olympiads}},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
- year = 2024,
- month = nov,
- url = {https://www.merl.com/publications/TR2024-160}
- }
,
- "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.
-
Videos
-
Software & Data Downloads
-
ComplexVAD Dataset -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
BAyesian Network for adaptive SAmple Consensus -
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
-
Explainable Video Anomaly Localization -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
MotionNet -
Street Scene Dataset -
FoldingNet++ -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
MERL Shopping Dataset -
Joint Geodesic Upsampling -
Plane Extraction using Agglomerative Clustering
-