Machine Learning

Data-driven approaches to design intelligent algorithms.

MERL has a long history of research activity in machine learning, including the development of various boosting algorithms and contributing to the theory and practice of highly scalable collaborative filtering. Our recent work has focused on deep learning and reinforcement learning, with application to a wide range of applications including automotive, robotics, factory automation, transportation, as well as building and home systems.

Quick Links
Researchers
Awards
- AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge
  Date: December 15, 2024
  Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
  MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
  Research Areas: Artificial Intelligence, Machine Learning, Information Security
  Brief
  - The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
- AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24
  Date: October 17, 2024
  Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
  MERL Contact: Diego Romeres
  Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, Robotics
  Brief
  - The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
    
    The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge
  Date: August 29, 2024
  Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
  MERL Contacts: François Germain; Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
  Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
  Brief
  - MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
    
    The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
    
    The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
See All Awards for Machine Learning
News & Events
- NEWS Suhas Lohit presents invited talk at Boston Symmetry Day 2025
  Date: March 31, 2025
  Where: Northeastern University, Boston, MA
  MERL Contact: Suhas Lohit
  Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
  Brief
  - MERL researcher Suhas Lohit was an invited speaker at Boston Symmetry Day, held at Northeastern University. Boston Symmetry Day, an annual workshop organized by researchers at MIT and Northeastern, brought together attendees interested in symmetry-informed machine learning and its applications. Suhas' talk, titled “Efficiency for Equivariance, and Efficiency through Equivariance” discussed recent MERL works that show how to build general and efficient equivariant neural networks, and how equivariance can be utilized in self-supervised learning to yield improved 3D object detection. The abstract and slides can be found in the link below.
- EVENT MERL Contributes to ICASSP 2025
  Date: Sunday, April 6, 2025 - Friday, April 11, 2025
  Location: Hyderabad, India
  MERL Contacts: Wael H. Ali; Petros T. Boufounos; Radu Corcodel; François Germain; Chiori Hori; Siddarth Jain; Devesh K. Jha; Toshiaki Koike-Akino; Jonathan Le Roux; Yanting Ma; Hassan Mansour; Yoshiki Masuyama; Joshua Rapp; Diego Romeres; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
  Research Areas: Artificial Intelligence, Communications, Computational Sensing, Electronic and Photonic Devices, Machine Learning, Robotics, Signal Processing, Speech & Audio
  Brief
  - MERL has made numerous contributions to both the organization and technical program of ICASSP 2025, which is being held in Hyderabad, India from April 6-11, 2025.
    
    Sponsorship
    
    MERL is proud to be a Silver Patron of the conference and will participate in the student job fair on Thursday, April 10. Please join this session to learn more about employment opportunities at MERL, including openings for research scientists, post-docs, and interns.
    
    MERL is pleased to be the sponsor of two IEEE Awards that will be presented at the conference. We congratulate Prof. Björn Erik Ottersten, the recipient of the 2025 IEEE Fourier Award for Signal Processing, and Prof. Shrikanth Narayanan, the recipient of the 2025 IEEE James L. Flanagan Speech and Audio Processing Award. Both awards will be presented in-person at ICASSP by Anthony Vetro, MERL President & CEO.
    
    Technical Program
    
    MERL is presenting 15 papers in the main conference on a wide range of topics including source separation, sound event detection, sound anomaly detection, speaker diarization, music generation, robot action generation from video, indoor airflow imaging, WiFi sensing, Doppler single-photon Lidar, optical coherence tomography, and radar imaging. Another paper on spatial audio will be presented at the Generative Data Augmentation for Real-World Signal Processing Applications (GenDA) Satellite Workshop.
    
    MERL Researchers Petros Boufounos and Hassan Mansour will present a Tutorial on “Computational Methods in Radar Imaging” in the afternoon of Monday, April 7.
    
    Petros Boufounos will also be giving an industry talk on Thursday April 10 at 12pm, on “A Physics-Informed Approach to Sensing".
    
    About ICASSP
    
    ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event has been attracting more than 4000 participants each year.
See All News & Events for Machine Learning
Research Highlights
Internships
- CV0101: Internship - Multimodal Algorithmic Reasoning
  
  MERL is looking for a self-motivated intern to research on problems at the intersection of multimodal large language models and neural algorithmic reasoning. An ideal intern would be a Ph.D. student with a strong background in machine learning and computer vision. The candidate must have prior experience with training multimodal LLMs for solving vision-and-language tasks. Experience in participating and winning mathematical Olympiads is desired. Publications in theoretical machine learning venues would be a strong plus. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
  Required Specific Experience
  - Experience with training large vision-and-language models
  - Experience with solving mathematical reasoning problems
  - Experience with programming in Python using PyTorch
  - Enrolled in a PhD program
  - Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- CV0075: Internship - Multimodal Embodied AI
  
  MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
  Required Specific Experience
  - Experience in designing 3D interactive scenes
  - Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
  - Experience training large language models on multimodal data
  - Experience with training reinforcement learning algorithms
  - Strong foundations in machine learning and programming
  - Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- CA0129: Internship - LLM-guided Active SLAM for Mobile Robots
  
  MERL is seeking interns passionate about robotics to contribute to the development of an Active Simultaneous Localization and Mapping (Active SLAM) framework guided by Large Language Models (LLM). The core objective is to achieve autonomous behavior for mobile robots. The methods will be implemented and evaluated in high performance simulators and (time-permitting) in actual robotic platforms, such as legged and wheeled robots. The expectation at the end of the internship is a publication at a top-tier robotic or computer vision conference and/or journal.
  The internship has a flexible start date (Spring/Summer 2025), with a duration of 3-6 months depending on agreed scope and intermediate progress.
  Required Specific Experience
  - Current/Past Enrollment in a PhD Program in Computer Engineering, Computer Science, Electrical Engineering, Mechanical Engineering, or related field
  - Experience with employing and fine-tuning LLM and/or Visual Language Models (VLM) for high-level context-aware planning and navigation
  - 2+ years experience with 3D computer vision (e.g., point cloud, voxels, camera pose estimation) and mapping, filter-based methods (e.g., EKF), and in at least some of: motion planning algorithms, factor graphs, control, and optimization
  - Excellent programming skills in Python and/or C/C++, with prior knowledge in ROS2 and high-fidelity simulators such as Gazebo, Isaac Lab, and/or Mujoco
  Additional Desired Experience
  - Prior experience with implementation and/or development of SLAM algorithms on robotic hardware, including acquisition, processing, and fusion of multimodal sensor data such as proprioceptive and exteroceptive sensors
See All Internships for Machine Learning
Openings
See All Openings at MERL
Recent Publications
- Koike-Akino, T., Tonin, F., Wu, Y., Wu, F.Z., Candogan, L.N., Cevher, V., "Quantum-PEFT: Ultra Parameter-Efficient Fine-Tuning", International Conference on Learning Representations (ICLR), April 2025.
  BibTeX TR2025-051 PDF
  - @inproceedings{Koike-Akino2025apr,
  - author = {Koike-Akino, Toshiaki and Tonin,Francesco and Wu,Yongtao and Wu,Frank Zhengqing and Candogan,Leyla Naz and Cevher, Volkan},
  - title = {{Quantum-PEFT: Ultra Parameter-Efficient Fine-Tuning}},
  - booktitle = {International Conference on Learning Representations (ICLR)},
  - year = 2025,
  - month = apr,
  - url = {https://www.merl.com/publications/TR2025-051}
  - }
- Tang, H., Ellis, K., Lohit, S., Jones, M.J., Chatterjee, M., "Programmatic Video Prediction Using Large Language Models", International Conference on Learning Representations Workshops (ICLRW), April 2025.
  BibTeX TR2025-049 PDF
  - @inproceedings{Tang2025apr,
  - author = {Tang, Hao and Ellis, Kevin and Lohit, Suhas and Jones, Michael J. and Chatterjee, Moitreya},
  - title = {{Programmatic Video Prediction Using Large Language Models}},
  - booktitle = {International Conference on Learning Representations Workshops (ICLRW)},
  - year = 2025,
  - month = apr,
  - url = {https://www.merl.com/publications/TR2025-049}
  - }
- Araki, S., Ito, N., Haeb-Umbach, R., Wichern, G., Wang, Z.-Q., Mitsufuji, Y., "30+ Years of Source Separation Research: Achievements and Future Challenges", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.
  BibTeX TR2025-036 PDF
  - @inproceedings{Araki2025mar,
  - author = {Araki, Shoko and Ito, Nobutaka and Haeb-Umbach, Reinhold and Wichern, Gordon and Wang, Zhong-Qiu and Mitsufuji, Yuki},
  - title = {{30+ Years of Source Separation Research: Achievements and Future Challenges}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2025,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2025-036}
  - }
- Ebbers, J., Germain, F.G., Wilkinghoff, K., Wichern, G., Le Roux, J., "No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.
  BibTeX TR2025-037 PDF
  - @inproceedings{Ebbers2025mar,
  - author = {Ebbers, Janek and Germain, François G and Wilkinghoff, Kevin and Wichern, Gordon and {Le Roux}, Jonathan},
  - title = {{No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2025,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2025-037}
  - }
- Gruttadauria, E., Fontaine, M., Le Roux, J., Essid, S., "O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.
  BibTeX TR2025-031 PDF
  - @inproceedings{Gruttadauria2025mar,
  - author = {Gruttadauria, Elio and Fontaine, Mathieu and {Le Roux}, Jonathan and Essid, Slim},
  - title = {{O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2025,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2025-031}
  - }
- Masuyama, Y., Wichern, G., Germain, F.G., Ick, C., Le Roux, J., "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.
  BibTeX TR2025-029 PDF Software
  - @inproceedings{Masuyama2025mar,
  - author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, François G and Ick, Christopher and {Le Roux}, Jonathan},
  - title = {{Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2025,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2025-029}
  - }
- Saijo, K., Ebbers, J., Germain, F.G., Khurana, S., Wichern, G., Le Roux, J., "Leveraging Audio-Only Data for Text-Queried Target Sound Extraction", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.
  BibTeX TR2025-033 PDF
  - @inproceedings{Saijo2025mar2,
  - author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Khurana, Sameer and Wichern, Gordon and {Le Roux}, Jonathan},
  - title = {{Leveraging Audio-Only Data for Text-Queried Target Sound Extraction}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2025,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2025-033}
  - }
- Saijo, K., Ebbers, J., Germain, F.G., Wichern, G., Le Roux, J., "Task-Aware Unified Source Separation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.
  BibTeX TR2025-032 PDF
  - @inproceedings{Saijo2025mar,
  - author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
  - title = {{Task-Aware Unified Source Separation}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2025,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2025-032}
  - }
See All Publications for Machine Learning
Videos

[MERL Seminar Series Spring 2025] Red Teaming AI Agents in-the-wild: Revealing Deployment Vulnerabilities

[MERL Seminar Series Spring 2025] The Emergence of Generalizability and Semantic Low-Dim Subspaces in Diffusion Models

[MERL Seminar Series Spring 2025] Amplifying human performance in combinatorial competitive programming

[WACV 2025] Towards Zero-shot 3D Anomaly Localization

Data-driven Spatial Classification using Multi-Arm Bandits for Monitoring with Robot Teams

[NeurIPS 2024] MEL-PETs Defense for the NeurIPS 2024 LLM Privacy Challenge Blue Team Track

[NeurIPS 2024] MEL-PETs Joint-Context Attack for the NeurIPS 2024 LLM Privacy Challenge Red Team Track

[NeurIPS 2024] Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

[MERL Seminar Series Fall 2024] Audio for Object and Spatial Awareness

[IROS 2024] Few-shot Transparent Instance Segmentation for Bin Picking

[MERL Seminar Series Fall 2024] Tools from cognitive science to understand the behavior of large language models

[ECCV 2024] PS-NEUS: A Probability-guided Sampler for Neural Implicit Surface Rendering

[ECCV 2024] Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection

[CVPR 2024] SIRA: Scalable Inter-frame Relation and Association for Radar Perception

Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinforcement Learning

[MERL Seminar Series Spring 2024] Neural Certificates and LLMs in Large-Scale Autonomy Design

[MERL Seminar Series Spring 2024] Decoding Hidden Worlds: Unprecedented Sensing and Connectivity for Climate, Robotics, & Smart Environments

Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

[MERL Seminar Series Spring 2024] Are Emergent Abilities of Large Language Models a Mirage?

MERL's Quantum AI Technology

[GLOBECOM 2022/VCC 2023 Tutorial] Post-Deep Learning Era: Emerging Quantum Machine Learning for Sensing and Communications (Session 2)

[GLOBECOM 2022/VCC 2023 Tutorial] Post-Deep Learning Era: Emerging Quantum Machine Learning for Sensing and Communications (Session I)

[MERL Seminar Series Fall 2023] Robust and Physics-informed machine learning for low light imaging

[MERL Seminar Series Fall 2023] Multiplicity in Machine Learning

Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection

[MERL Seminar Series Fall 2023] Visual Programming - A compositional approach to building General Purpose Vision Systems

[MERL Seminar Series Fall 2023] The Confluence of Vision, Language, and Robotics

[MERL Seminar Series Fall 2023] A Process Systems Engineering Perspective on Carbon Capture: Key Challenges and Opportunities

[TRO 2024] Safe Multiagent Motion Planning Under Uncertainty for Drones Using Filtered Reinforcement Learning

Are Deep Neural Networks SMARTer than Second Graders?

[CVPR 2023] EVAL: Explainable Video Anomaly Localization

[MERL Seminar Series Spring 2023] Learning and Dynamical Systems

[MERL Seminar Series Spring 2023] Investigating Multi-Agent Reinforcement Learning for Grid-Interactive Smart Communities using CityLearn

[MERL Seminar Series Spring 2023] Pitfalls and Opportunities in Interpretable Machine Learning

[MERL Seminar Series Spring 2023] Neural Implicit Flow

[MERL Seminar Series Spring 2023] Towards Complex Language in Partially Observed Environments

[MERL Seminar Series Spring 2022] Hybrid robotics and implicit learning

Toshiaki Koike-Akino Gives Seminar Talk at IEEE Boston Photonics

[MERL Seminar Series Spring 2022] RLMPC: An Ideal Combination of Formal Optimal Control and Reinforcement Learning?

[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning

[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision

[MERL Seminar Series Spring 2022] Extreme optics design as a large-scale optimization problem

HealthCam: A system for non-contact monitoring of vital signs

[MERL Seminar Series 2021] Harnessing machine learning to build better Earth system models for climate projection

[MERL Seminar Series 2021] Deep probabilistic regression

[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning

Control of Mechanical Systems via Feedback Linearization Based on Black-Box Gaussian Process Models

Application of Deep Learning for Nanophotonic Device Design (Invited)

Towards Human-Level Learning of Complex Physical Puzzles

Tactile-RL for Insertion: Generalization to Objects of Unknown Geometry

Scene-Aware Interaction Technology

Action Detection Using A Deep Recurrent Neural Network

MERL Research on Autonomous Vehicles

Semantic Scene Labeling

Obstacle Detection

Deep Hierarchical Parsing for Semantic Segmentation

Global Local Face Upsampling Network
Software & Data Downloads

Software & Data Downloads

MERL is making Machine Learning software and data available to the research community:

Self-Monitored Inference-Time INtervention for Generative Music Transformers (SMITIN)
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization (ranf-hrtf)
Radar dEtection TRansformer (RETR)
Meta-Learning State Space Models (MetaLIC)
MEL-PETs Defense for LLM Privacy Challenge (melpets-llmpc2024-blue-team)
MEL-PETs Joint-Context Attack for LLM Privacy Challenge (melpets-llmpc2024-red-team)

See All Downloads