News & Events

TALK [MERL Seminar Series 2025] Andy Zou presents talk titled Red Teaming AI Agents in-the-wild: Revealing Deployment Vulnerabilities
Date & Time: Wednesday, March 26, 2025; 1:00 PM
Speaker: Andy Zou, CMU & Gray Swan AI
MERL Host: Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Information Security
Abstract
- This presentation demonstrates how red teaming uncovers critical vulnerabilities in AI agents that challenge assumptions about safe deployment. The talk discusses the risks of integrating AI into real-world applications and recommends practical safeguards to enhance resilience and ensure dependable deployment in high-risk settings.
TALK [MERL Seminar Series 2025] Qing Qu presents talk titled The Emergence of Generalizability and Semantic Low-Dim Subspaces in Diffusion Models
Date & Time: Wednesday, March 5, 2025; 12:00 PM
Speaker: Qing Qu, University of Michigan
MERL Host: Pu (Perry) Wang
Research Areas: Artificial Intelligence, Computational Sensing, Machine Learning, Signal Processing
Abstract
- Recent empirical studies have shown that diffusion models possess a unique reproducibility property, transiting from memorization to generalization as the number of training samples increases. This demonstrates that diffusion models can effectively learn image distributions and generate new samples. Remarkably, these models achieve this even with a small number of training samples, despite the challenge of large image dimensions, effectively circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging two key empirical observations: (i) the low intrinsic dimensionality of image datasets and (ii) the low-rank property of the denoising autoencoder in trained diffusion models. With these setups, we rigorously demonstrate that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem across the training samples. This insight has practical implications for training and controlling diffusion models. Specifically, it enables us to precisely characterize the minimal number of samples necessary for accurately learning the low-rank data support, shedding light on the phase transition from memorization to generalization. Additionally, we empirically establish a correspondence between the subspaces and the semantic representations of image data, which enables one-step, transferrable, efficient image editing. Moreover, our results have profound practical implications for training efficiency and model safety, and they also open up numerous intriguing theoretical questions for future research.
TALK [MERL Seminar Series 2024] Di Shi presents talk titled AI-assisted Power Grid Dispatch and Control: Optimization, Safety, and Real-world Demonstrations
Date & Time: Wednesday, November 20, 2024; 1:00 PM
Speaker: Di Shi, New Mexico State University
MERL Host: Hongbo Sun
Research Areas: Artificial Intelligence, Data Analytics, Optimization
Abstract
- This presentation delves into the challenges and advancements in optimizing power system operations through Grid Mind, an innovative, data-driven framework designed to enhance the integration of renewable energy sources. Utilizing advanced learning algorithms, Grid Mind excels in strategic resource allocation and control, significantly improving efficiency and reliability in power systems with high renewable energy penetration. The transformative potential of this AI-assisted technology is highlighted through real-world applications, demonstrating its effectiveness in addressing the complexities of modern power systems. In addition, critical safety considerations and practical deployment challenges are explored, emphasizing the need for robust, secure, and adaptable solutions. This talk also discusses the capabilities of Grid Mind as a distributed, learning-based system optimized for edge devices, marking a significant advancement toward sustainable, safe, and efficient power system operations in an era dominated by renewable energy.
TALK [MERL Seminar Series 2024] Samuel Clarke presents talk titled Audio for Object and Spatial Awareness
Date & Time: Wednesday, October 30, 2024; 1:00 PM
Speaker: Samuel Clarke, Stanford University
MERL Host: Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
Abstract
- Acoustic perception is invaluable to humans and robots in understanding objects and events in their environments. These sounds are dependent on properties of the source, the environment, and the receiver. Many humans possess remarkable intuition both to infer key properties of each of these three aspects from a sound and to form expectations of how these different aspects would affect the sound they hear. In order to equip robots and AI agents with similar if not stronger capabilities, our research has taken a two-fold path. First, we collect high-fidelity datasets in both controlled and uncontrolled environments which capture real sounds of objects and rooms. Second, we introduce differentiable physics-based models that can estimate acoustic properties of objects and rooms from minimal amounts of real audio data, then can predict new sounds from these objects and rooms under novel, “unseen” conditions.
TALK [MERL Seminar Series 2024] Zhaojian Li presents talk titled A Multi-Arm Robotic System for Robotic Apple Harvesting
Date & Time: Wednesday, October 2, 2024; 1:00 PM
Speaker: Zhaojian Li, Mivchigan State University
MERL Host: Yebin Wang
Research Areas: Artificial Intelligence, Computer Vision, Control, Robotics
Abstract
- Harvesting labor is the single largest cost in apple production in the U.S. Surging cost and growing shortage of labor has forced the apple industry to seek automated harvesting solutions. Despite considerable progress in recent years, the existing robotic harvesting systems still fall short of performance expectations, lacking robustness and proving inefficient or overly complex for practical commercial deployment. In this talk, I will present the development and evaluation of a new dual-arm robotic apple harvesting system. This work is a result of a continuous collaboration between Michigan State University and U.S. Department of Agriculture.
TALK [MERL Seminar Series 2024] Tom Griffiths presents talk titled Tools from cognitive science to understand the behavior of large language models
Date & Time: Wednesday, September 18, 2024; 1:00 PM
Speaker: Tom Griffiths, Princeton University
Research Areas: Artificial Intelligence, Data Analytics, Machine Learning, Human-Computer Interaction
Abstract
- Large language models have been found to have surprising capabilities, even what have been called “sparks of artificial general intelligence.” However, understanding these models involves some significant challenges: their internal structure is extremely complicated, their training data is often opaque, and getting access to the underlying mechanisms is becoming increasingly difficult. As a consequence, researchers often have to resort to studying these systems based on their behavior. This situation is, of course, one that cognitive scientists are very familiar with — human brains are complicated systems trained on opaque data and typically difficult to study mechanistically. In this talk I will summarize some of the tools of cognitive science that are useful for understanding the behavior of large language models. Specifically, I will talk about how thinking about different levels of analysis (and Bayesian inference) can help us understand some behaviors that don’t seem particularly intelligent, how tasks like similarity judgment can be used to probe internal representations, how axiom violations can reveal interesting mechanisms, and how associations can reveal biases in systems that have been trained to be unbiased.
TALK [MERL Seminar Series 2024] Chuchu Fan presents talk titled Neural Certificates and LLMs in Large-Scale Autonomy Design
Date & Time: Wednesday, May 29, 2024; 12:00 PM
Speaker: Chuchu Fan, MIT
MERL Host: Abraham P. Vinod
Research Areas: Artificial Intelligence, Control, Machine Learning
Abstract
- Learning-enabled control systems have demonstrated impressive empirical performance on challenging control problems in robotics. However, this performance often arrives with the trade-off of diminished transparency and the absence of guarantees regarding the safety and stability of the learned controllers. In recent years, new techniques have emerged to provide these guarantees by learning certificates alongside control policies — these certificates provide concise, data-driven proofs that guarantee the safety and stability of the learned control system. These methods not only allow the user to verify the safety of a learned controller but also provide supervision during training, allowing safety and stability requirements to influence the training process itself. In this talk, we present two exciting updates on neural certificates. In the first work, we explore the use of graph neural networks to learn collision-avoidance certificates that can generalize to unseen and very crowded environments. The second work presents a novel reinforcement learning approach that can produce certificate functions with the policies while addressing the instability issues in the optimization process. Finally, if time permits, I will also talk about my group's recent work using LLM and domain-specific task and motion planners to allow natural language as input for robot planning.
TALK [MERL Seminar Series 2024] Sanmi Koyejo presents talk titled Are Emergent Abilities of Large Language Models a Mirage?
Date & Time: Wednesday, March 20, 2024; 1:00 PM
Speaker: Sanmi Koyejo, Stanford University
MERL Host: Jing Liu
Research Areas: Artificial Intelligence, Machine Learning
Abstract
- Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance. We present our alternative explanation in a simple mathematical model. Via the presented analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.
EVENT MERL Contributes to ICASSP 2024
Date: Sunday, April 14, 2024 - Friday, April 19, 2024
Location: Seoul, South Korea
MERL Contacts: Petros T. Boufounos; François Germain; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Hassan Mansour; Kieran Parsons; Joshua Rapp; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computational Sensing, Machine Learning, Robotics, Signal Processing, Speech & Audio
Brief
- MERL has made numerous contributions to both the organization and technical program of ICASSP 2024, which is being held in Seoul, Korea from April 14-19, 2024.
  
  Sponsorship and Awards
  
  MERL is proud to be a Bronze Patron of the conference and will participate in the student job fair on Thursday, April 18. Please join this session to learn more about employment opportunities at MERL, including openings for research scientists, post-docs, and interns.
  
  MERL is pleased to be the sponsor of two IEEE Awards that will be presented at the conference. We congratulate Prof. Stéphane G. Mallat, the recipient of the 2024 IEEE Fourier Award for Signal Processing, and Prof. Keiichi Tokuda, the recipient of the 2024 IEEE James L. Flanagan Speech and Audio Processing Award.
  
  Jonathan Le Roux, MERL Speech and Audio Senior Team Leader, will also be recognized during the Awards Ceremony for his recent elevation to IEEE Fellow.
  
  Technical Program
  
  MERL will present 13 papers in the main conference on a wide range of topics including automated audio captioning, speech separation, audio generative models, speech and sound synthesis, spatial audio reproduction, multimodal indoor monitoring, radar imaging, depth estimation, physics-informed machine learning, and integrated sensing and communications (ISAC). Three workshop papers have also been accepted for presentation on audio-visual speaker diarization, music source separation, and music generative models.
  
  Perry Wang is the co-organizer of the Workshop on Signal Processing and Machine Learning Advances in Automotive Radars (SPLAR), held on Sunday, April 14. It features keynote talks from leaders in both academia and industry, peer-reviewed workshop papers, and lightning talks from ICASSP regular tracks on signal processing and machine learning for automotive radar and, more generally, radar perception.
  
  Gordon Wichern will present an invited keynote talk on analyzing and interpreting audio deep learning models at the Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), held on Monday, April 15. He will also appear in a panel discussion on interpretable audio AI at the workshop.
  
  Perry Wang also co-organizes a two-part special session on Next-Generation Wi-Fi Sensing (SS-L9 and SS-L13) which will be held on Thursday afternoon, April 18. The special session includes papers on PHY-layer oriented signal processing and data-driven deep learning advances, and supports upcoming 802.11bf WLAN Sensing Standardization activities.
  
  Petros Boufounos is participating as a mentor in ICASSP’s Micro-Mentoring Experience Program (MiME).
  
  About ICASSP
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 3000 participants.
TALK [MERL Seminar Series 2024] Melanie Mitchell presents talk titled "The Debate Over 'Understanding' in AI's Large Language Models"
Date & Time: Tuesday, February 13, 2024; 1:00 PM
Speaker: Melanie Mitchell, Santa Fe Institute
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Human-Computer Interaction
Abstract
- I will survey a current, heated debate in the AI research community on whether large pre-trained language models can be said to "understand" language -- and the physical and social situations language encodes -- in any important sense. I will describe arguments that have been made for and against such understanding, and, more generally, will discuss what methods can be used to fairly evaluate understanding and intelligence in AI systems. I will conclude with key questions for the broader sciences of intelligence that have arisen in light of these discussions.
TALK [MERL Seminar Series 2024] Greta Tuckute presents talk titled Computational models of human auditory and language processing
Date & Time: Wednesday, January 31, 2024; 12:00 PM
Speaker: Greta Tuckute, MIT
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Abstract
- Advances in machine learning have led to powerful models for audio and language, proficient in tasks like speech recognition and fluent language generation. Beyond their immense utility in engineering applications, these models offer valuable tools for cognitive science and neuroscience. In this talk, I will demonstrate how these artificial neural network models can be used to understand how the human brain processes language. The first part of the talk will cover how audio neural networks serve as computational accounts for brain activity in the auditory cortex. The second part will focus on the use of large language models, such as those in the GPT family, to non-invasively control brain activity in the human language system.
TALK [MERL Seminar Series 2023] Prof. Flavio Calmon presents talk titled Multiplicity in Machine Learning
Date & Time: Tuesday, November 7, 2023; 12:00 PM
Speaker: Flavio Calmon, Harvard University
MERL Host: Ye Wang
Research Areas: Artificial Intelligence, Machine Learning
Abstract
- This talk reviews the concept of predictive multiplicity in machine learning. Predictive multiplicity arises when different classifiers achieve similar average performance for a specific learning task yet produce conflicting predictions for individual samples. We discuss a metric called “Rashomon Capacity” for quantifying predictive multiplicity in multi-class classification. We also present recent findings on the multiplicity cost of differentially private training methods and group fairness interventions in machine learning.
  
  This talk is based on work published at ICML'20, NeurIPS'22, ACM FAccT'23, and NeurIPS'23.
TALK [MERL Seminar Series 2023] Dr. Tanmay Gupta presents talk titled Visual Programming - A compositional approach to building General Purpose Vision Systems
Date & Time: Tuesday, October 31, 2023; 2:00 PM
Speaker: Tanmay Gupta, Allen Institute for Artificial Intelligence
MERL Host: Moitreya Chatterjee
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
- Building General Purpose Vision Systems (GPVs) that can perform a huge variety of tasks has been a long-standing goal for the computer vision community. However, end-to-end training of these systems to handle different modalities and tasks has proven to be extremely challenging. In this talk, I will describe a lucrative neuro-symbolic alternative to the common end-to-end learning paradigm called Visual Programming. Visual Programming is a general framework that leverages the code-generation abilities of LLMs, existing neural models, and non-differentiable programs to enable powerful applications. Some of these applications continue to remain elusive for the current generation of end-to-end trained GPVs.
TALK [MERL Seminar Series 2023] Prof. Komei Sugiura presents talk titled The Confluence of Vision, Language, and Robotics
Date & Time: Thursday, September 28, 2023; 12:00 PM
Speaker: Komei Sugiura, Keio University
MERL Host: Chiori Hori
Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
Abstract
- Recent advances in multimodal models that fuse vision and language are revolutionizing robotics. In this lecture, I will begin by introducing recent multimodal foundational models and their applications in robotics. The second topic of this talk will address our recent work on multimodal language processing in robotics. The shortage of home care workers has become a pressing societal issue, and the use of domestic service robots (DSRs) to assist individuals with disabilities is seen as a possible solution. I will present our work on DSRs that are capable of open-vocabulary mobile manipulation, referring expression comprehension and segmentation models for everyday objects, and future captioning methods for cooking videos and DSRs.
EVENT MERL Contributes to ICASSP 2023
Date: Sunday, June 4, 2023 - Saturday, June 10, 2023
Location: Rhodes Island, Greece
MERL Contacts: Petros T. Boufounos; François Germain; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Suhas Lohit; Yanting Ma; Hassan Mansour; Joshua Rapp; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computational Sensing, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL has made numerous contributions to both the organization and technical program of ICASSP 2023, which is being held in Rhodes Island, Greece from June 4-10, 2023.
  
  Organization
  
  Petros Boufounos is serving as General Co-Chair of the conference this year, where he has been involved in all aspects of conference planning and execution.
  
  Perry Wang is the organizer of a special session on Radar-Assisted Perception (RAP), which will be held on Wednesday, June 7. The session will feature talks on signal processing and deep learning for radar perception, pose estimation, and mutual interference mitigation with speakers from both academia (Carnegie Mellon University, Virginia Tech, University of Illinois Urbana-Champaign) and industry (Mitsubishi Electric, Bosch, Waveye).
  
  Anthony Vetro is the co-organizer of the Workshop on Signal Processing for Autonomous Systems (SPAS), which will be held on Monday, June 5, and feature invited talks from leaders in both academia and industry on timely topics related to autonomous systems.
  
  Sponsorship
  
  MERL is proud to be a Silver Patron of the conference and will participate in the student job fair on Thursday, June 8. Please join this session to learn more about employment opportunities at MERL, including openings for research scientists, post-docs, and interns.
  
  MERL is pleased to be the sponsor of two IEEE Awards that will be presented at the conference. We congratulate Prof. Rabab Ward, the recipient of the 2023 IEEE Fourier Award for Signal Processing, and Prof. Alexander Waibel, the recipient of the 2023 IEEE James L. Flanagan Speech and Audio Processing Award.
  
  Technical Program
  
  MERL is presenting 13 papers in the main conference on a wide range of topics including source separation and speech enhancement, radar imaging, depth estimation, motor fault detection, time series recovery, and point clouds. One workshop paper has also been accepted for presentation on self-supervised music source separation.
  
  Perry Wang has been invited to give a keynote talk on Wi-Fi sensing and related standards activities at the Workshop on Integrated Sensing and Communications (ISAC), which will be held on Sunday, June 4.
  
  Additionally, Anthony Vetro will present a Perspective Talk on Physics-Grounded Machine Learning, which is scheduled for Thursday, June 8.
  
  About ICASSP
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
TALK [MERL Seminar Series 2023] Prof. Dan Stowell presents talk titled Fine-grained wildlife sound recognition: Towards the accuracy of a naturalist
Date & Time: Tuesday, April 25, 2023; 11:00 AM
Speaker: Dan Stowell, Tilburg University / Naturalis Biodiversity Centre
MERL Host: Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Abstract
- Machine learning can be used to identify animals from their sound. This could be a valuable tool for biodiversity monitoring, and for understanding animal behaviour and communication. But to get there, we need very high accuracy at fine-grained acoustic distinctions across hundreds of categories in diverse conditions. In our group we are studying how to achieve this at continental scale. I will describe aspects of bioacoustic data that challenge even the latest deep learning workflows, and our work to address this. Methods covered include adaptive feature representations, deep embeddings and few-shot learning.
TALK [MERL Seminar Series 2023] Dr. Suraj Srinivas presents talk titled Pitfalls and Opportunities in Interpretable Machine Learning
Date & Time: Tuesday, March 14, 2023; 1:00 PM
Speaker: Suraj Srinivas, Harvard University
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
- In this talk, I will discuss our recent research on understanding post-hoc interpretability. I will begin by introducing a characterization of post-hoc interpretability methods as local function approximators, and the implications of this viewpoint, including a no-free-lunch theorem for explanations. Next, we shall challenge the assumption that post-hoc explanations provide information about a model's discriminative capabilities p(y|x) and instead demonstrate that many common methods instead rely on a conditional generative model p(x|y). This observation underscores the importance of being cautious when using such methods in practice. Finally, I will propose to resolve this via regularization of model structure, specifically by training low curvature neural networks, resulting in improved model robustness and stable gradients.
EVENT MERL's Virtual Open House 2022
Date & Time: Monday, December 12, 2022; 1:00pm-5:30pm ET
Location: Mitsubishi Electric Research Laboratories (MERL)/Virtual
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video
Brief
- Join MERL's virtual open house on December 12th, 2022! Featuring a keynote, live sessions, research area booths, and opportunities to interact with our research team. Discover who we are and what we do, and learn about internship and employment opportunities.
TALK [MERL Seminar Series 2022] Prof. Jiajun Wu presents talk titled Understanding the Visual World Through Naturally Supervised Code
Date & Time: Tuesday, November 1, 2022; 1:00 PM
Speaker: Jiajun Wu, Stanford University
MERL Host: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
- The visual world has its inherent structure: scenes are made of multiple identical objects; different objects may have the same color or material, with a regular layout; each object can be symmetric and have repetitive parts. How can we infer, represent, and use such structure from raw data, without hampering the expressiveness of neural networks? In this talk, I will demonstrate that such structure, or code, can be learned from natural supervision. Here, natural supervision can be from pixels, where neuro-symbolic methods automatically discover repetitive parts and objects for scene synthesis. It can also be from objects, where humans during fabrication introduce priors that can be leveraged by machines to infer regular intrinsics such as texture and material. When solving these problems, structured representations and neural nets play complementary roles: it is more data-efficient to learn with structured representations, and they generalize better to new scenarios with robustly captured high-level information; neural nets effectively extract complex, low-level features from cluttered and noisy visual data.
EVENT SANE 2022 - Speech and Audio in the Northeast
Date: Thursday, October 6, 2022
Location: Kendall Square, Cambridge, MA
MERL Contacts: Anoop Cherian; Jonathan Le Roux
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief
- SANE 2022, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 6, 2022 in Kendall Square, Cambridge, MA.
  
  It was the 9th edition in the SANE series of workshops, which started in 2012 and was held every year alternately in Boston and New York until 2019. Since the first edition, the audience has grown to a record 200 participants and 45 posters in 2019. After a 2-year hiatus due to the pandemic, SANE returned with an in-person gathering of 140 students and researchers.
  
  SANE 2022 featured invited talks by seven leading researchers from the Northeast: Rupal Patel (Northeastern/VocaliD), Wei-Ning Hsu (Meta FAIR), Scott Wisdom (Google), Tara Sainath (Google), Shinji Watanabe (CMU), Anoop Cherian (MERL), and Chuang Gan (UMass Amherst/MIT-IBM Watson AI Lab). It also featured a lively poster session with 29 posters.
  
  SANE 2022 was co-organized by Jonathan Le Roux (MERL), Arnab Ghoshal (Apple), John Hershey (Google), and Shinji Watanabe (CMU). SANE remained a free event thanks to generous sponsorship by Bose, Google, MERL, and Microsoft.
  
  Slides and videos of the talks will be released on the SANE workshop website.
TALK [MERL Seminar Series 2022] Prof. Chuang Gan presents talk titled Learning to Perceive Physical Scenes from Multi-Sensory Data
Date & Time: Tuesday, September 6, 2022; 12:00 PM EDT
Speaker: Chuang Gan, UMass Amherst & MIT-IBM Watson AI Lab
MERL Host: Jonathan Le Roux
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Abstract
- Human sensory perception of the physical world is rich and multimodal and can flexibly integrate input from all five sensory modalities -- vision, touch, smell, hearing, and taste. However, in AI, attention has primarily focused on visual perception. In this talk, I will introduce my efforts in connecting vision with sound, which will allow machine perception systems to see objects and infer physics from multi-sensory data. In the first part of my talk, I will introduce a. self-supervised approach that could learn to parse images and separate the sound sources by watching and listening to unlabeled videos without requiring additional manual supervision. In the second part of my talk, I will show we may further infer the underlying causal structure in 3D environments through visual and auditory observations. This enables agents to seek the sound source of repeating environmental sound (e.g., alarm) or identify what object has fallen, and where, from an intermittent impact sound.
TALK [MERL Seminar Series 2022] Prof. Vincent Sitzmann presents talk titled Self-Supervised Scene Representation Learning
Date & Time: Wednesday, March 30, 2022; 11:00 AM EDT
Speaker: Vincent Sitzmann, MIT
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
- Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is an underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations.
  
  I will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. I will show how this approach unlocks the learning of priors, enabling 3D reconstruction from only a single posed 2D image, and how we may extend these representations to other modalities such as sound. I will then discuss recent work on learning the neural rendering operator to make rendering and training fast, and how this speed-up enables us to learn object-centric neural scene representations, learning to decompose 3D scenes into objects, given only images. Finally, I will talk about a recent application of self-supervised scene representation learning in robotic manipulation, where it enables us to learn to manipulate classes of objects in unseen poses from only a handful of human demonstrations.
TALK [MERL Seminar Series 2022] Learning Speech Representations with Multimodal Self-Supervision
Date & Time: Tuesday, March 1, 2022; 1:00 PM EST
Speaker: David Harwath, The University of Texas at Austin
MERL Host: Chiori Hori
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Abstract
- Humans learn spoken language and visual perception at an early age by being immersed in the world around them. Why can't computers do the same? In this talk, I will describe our ongoing work to develop methodologies for grounding continuous speech signals at the raw waveform level to natural image scenes. I will first present self-supervised models capable of discovering discrete, hierarchical structure (words and sub-word units) in the speech signal. Instead of conventional annotations, these models learn from correspondences between speech sounds and visual patterns such as objects and textures. Next, I will demonstrate how these discrete units can be used as a drop-in replacement for text transcriptions in an image captioning system, enabling us to directly synthesize spoken descriptions of images without the need for text as an intermediate representation. Finally, I will describe our latest work on Transformer-based models of visually-grounded speech. These models significantly outperform the prior state of the art on semantic speech-to-image retrieval tasks, and also learn representations that are useful for a multitude of other speech processing tasks.
EVENT Prof. Melanie Zeilinger of ETH to give keynote at MERL's Virtual Open House
Date & Time: Thursday, December 9, 2021; 1:00pm - 5:30pm EST
Location: Virtual Event
Speaker: Prof. Melanie Zeilinger, ETH
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video, Human-Computer Interaction, Information Security
Brief
- MERL is excited to announce the second keynote speaker for our Virtual Open House 2021:
  Prof. Melanie Zeilinger from ETH .
  
  Our virtual open house will take place on December 9, 2021, 1:00pm - 5:30pm (EST).
  
  Join us to learn more about who we are, what we do, and discuss our internship and employment opportunities. Prof. Zeilinger's talk is scheduled for 3:15pm - 3:45pm (EST).
  
  Registration: https://mailchi.mp/merl/merlvoh2021
  
  Keynote Title: Control Meets Learning - On Performance, Safety and User Interaction
  
  Abstract: With increasing sensing and communication capabilities, physical systems today are becoming one of the largest generators of data, making learning a central component of autonomous control systems. While this paradigm shift offers tremendous opportunities to address new levels of system complexity, variability and user interaction, it also raises fundamental questions of learning in a closed-loop dynamical control system. In this talk, I will present some of our recent results showing how even safety-critical systems can leverage the potential of data. I will first briefly present concepts for using learning for automatic controller design and for a new safety framework that can equip any learning-based controller with safety guarantees. The second part will then discuss how expert and user information can be utilized to optimize system performance, where I will particularly highlight an approach developed together with MERL for personalizing the motion planning in autonomous driving to the individual driving style of a passenger.
EVENT Prof. Ashok Veeraraghavan of Rice University to give keynote at MERL's Virtual Open House
Date & Time: Thursday, December 9, 2021; 1:00pm - 5:30pm EST
Location: Virtual Event
Speaker: Prof. Ashok Veeraraghavan, Rice University
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video, Human-Computer Interaction, Information Security
Brief
- MERL is excited to announce the first keynote speaker for our Virtual Open House 2021:
  Prof. Ashok Veeraraghavan from Rice University.
  
  Our virtual open house will take place on December 9, 2021, 1:00pm - 5:30pm (EST).
  
  Join us to learn more about who we are, what we do, and discuss our internship and employment opportunities. Prof. Veeraraghavan's talk is scheduled for 1:15pm - 1:45pm (EST).
  
  Registration: https://mailchi.mp/merl/merlvoh2021
  
  Keynote Title: Computational Imaging: Beyond the limits imposed by lenses.
  
  Abstract: The lens has long been a central element of cameras, since its early use in the mid-nineteenth century by Niepce, Talbot, and Daguerre. The role of the lens, from the Daguerrotype to modern digital cameras, is to refract light to achieve a one-to-one mapping between a point in the scene and a point on the sensor. This effect enables the sensor to compute a particular two-dimensional (2D) integral of the incident 4D light-field. We propose a radical departure from this practice and the many limitations it imposes. In the talk we focus on two inter-related research projects that attempt to go beyond lens-based imaging.
  
  First, we discuss our lab’s recent efforts to build flat, extremely thin imaging devices by replacing the lens in a conventional camera with an amplitude mask and computational reconstruction algorithms. These lensless cameras, called FlatCams can be less than a millimeter in thickness and enable applications where size, weight, thickness or cost are the driving factors. Second, we discuss high-resolution, long-distance imaging using Fourier Ptychography, where the need for a large aperture aberration corrected lens is replaced by a camera array and associated phase retrieval algorithms resulting again in order of magnitude reductions in size, weight and cost. Finally, I will spend a few minutes discussing how the wholistic computational imaging approach can be used to create ultra-high-resolution wavefront sensors.