- Date: May 12, 2019 - May 17, 2019
Where: Brighton, UK
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief - MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: February 13, 2019
Where: Tokyo, Japan
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Area: Speech & Audio
Brief - Mitsubishi Electric Corporation announced that it has developed the world's first technology capable of highly accurate multilingual speech recognition without being informed which language is being spoken. The novel technology, Seamless Speech Recognition, incorporates Mitsubishi Electric's proprietary Maisart compact AI technology and is built on a single system that can simultaneously identify and understand spoken languages. In tests involving 5 languages, the system achieved recognition with over 90 percent accuracy, without being informed which language was being spoken. When incorporating 5 more languages with lower resources, accuracy remained above 80 percent. The technology can also understand multiple people speaking either the same or different languages simultaneously. A live demonstration involving a multilingual airport guidance system took place on February 13 in Tokyo, Japan. It was widely covered by the Japanese media, with reports by all six main Japanese TV stations and multiple articles in print and online newspapers, including in Japan's top newspaper, Asahi Shimbun. The technology is based on recent research by MERL's Speech and Audio team.
Link:
Mitsubishi Electric Corporation Press Release
Media Coverage:
NHK, News (Japanese)
NHK World, News (English), video report (starting at 4'38")
TV Asahi, ANN news (Japanese)
Nippon TV, News24 (Japanese)
Fuji TV, Prime News Alpha (Japanese)
TV Tokyo, World Business Satellite (Japanese)
TV Tokyo, Morning Satellite (Japanese)
TBS, News, N Studio (Japanese)
The Asahi Shimbun (Japanese)
The Nikkei Shimbun (Japanese)
Nikkei xTech (Japanese)
Response (Japanese).
-
- Date: Thursday, October 18, 2018
Location: Google, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.
It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.
SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
-
- Date: June 25, 2018 - August 3, 2018
Where: Johns Hopkins University, Baltimore, MD
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL Speech & Audio Team researcher Takaaki Hori led a team of 27 senior researchers and Ph.D. students from different organizations around the world, working on "Multi-lingual End-to-End Speech Recognition for Incomplete Data" as part of the Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The JSALT workshop is a renowned 6-week hands-on workshop held yearly since 1995. This year, the workshop was held at Johns Hopkins University in Baltimore from June 25 to August 3, 2018. Takaaki's team developed new methods for end-to-end Automatic Speech Recognition (ASR) with a focus on low-resource languages with limited labelled data.
End-to-end ASR can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. Some end-to-end systems have recently achieved performance comparable to or better than conventional systems in several tasks. However, the current model training algorithms basically require paired data, i.e., speech data and the corresponding transcription. Sufficient amount of such complete data is usually unavailable for minor languages, and creating such data sets is very expensive and time consuming.
The goal of Takaaki's team project was to expand the applicability of end-to-end models to multilingual ASR, and to develop new technology that would make it possible to build highly accurate systems even for low-resource languages without a large amount of paired data. Some major accomplishments of the team include building multi-lingual end-to-end ASR systems for 17 languages, developing novel architectures and training methods for end-to-end ASR, building end-to-end ASR-TTS (Text-to-speech) chain for unpaired data training, and developing ESPnet, an open-source end-to-end speech processing toolkit. Three papers stemming from the team's work have already been accepted to the 2018 IEEE Spoken Language Technology Workshop (SLT), with several more to be submitted to upcoming conferences.
-
- Date: April 17, 2018
Awarded to: Zhong-Qiu Wang
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Former MERL intern Zhong-Qiu Wang (Ph.D. Candidate at Ohio State University) has received a Best Student Paper Award at the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) for the paper "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation" by Zhong-Qiu Wang, Jonathan Le Roux, and John Hershey. The paper presents work performed during Zhong-Qiu's internship at MERL in the summer 2017, extending MERL's pioneering Deep Clustering framework for speech separation to a multi-channel setup. The award was received on behalf on Zhong-Qiu by MERL researcher and co-author Jonathan Le Roux during the conference, held in Calgary April 15-20.
-
- Date: April 15, 2018 - April 20, 2018
Where: Calgary, AB
MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Pu (Perry) Wang
Research Areas: Computational Sensing, Digital Video, Speech & Audio
Brief - MERL researchers are presenting 9 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Calgary from April 15-20, 2018. Topics to be presented include recent advances in speech recognition, audio processing, and computational sensing. MERL is also a sponsor of the conference.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date & Time: Tuesday, March 6, 2018; 12:00 PM
Speaker: Scott Wisdom, Affectiva
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract - Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.
In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.
-
- Date: February 5, 2018
Where: National Public Radio (NPR)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL's speech separation technology was featured in NPR's All Things Considered, as part of an episode of All Tech Considered on artificial intelligence, "Can Computers Learn Like Humans?". An example separating the overlapped speech of two of the show's hosts was played on the air.
The technology is based on a proprietary deep learning method called Deep Clustering. It is the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It is a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations.
A live demonstration was featured in Mitsubishi Electric Corporation's Annual R&D Open House last year, and was also covered in international media at the time.
(Photo credit: Sam Rowe for NPR)
Link:
"Can Computers Learn Like Humans?" (NPR, All Things Considered)
MERL Deep Clustering Demo.
-
- Date: December 16, 2017 - December 20, 2017
Where: Okinawa, Japan
MERL Contacts: Chiori Hori; Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL presented three papers at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), which was held in Okinawa, Japan from December 16-20, 2017. ASRU is the premier speech workshop, bringing together researchers from academia and industry in an intimate and collegial setting. More than 270 people attended the event this year, a record number. MERL's Speech and Audio Team was a key part of the organization of the workshop, with John Hershey serving as General Chair, Chiori Hori as Sponsorship Chair, and Jonathan Le Roux as Demonstration Chair. Two of the papers by MERL were selected among the 10 finalists for the best paper award. Mitsubishi Electric and MERL were also Platinum sponsors of the conference, with MERL awarding the MERL Best Student Paper Award.
-
- Date: May 24, 2017
Where: Tokyo, Japan
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Mitsubishi Electric Corporation announced that it has created the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It's a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations. In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The novel technology, which was realized with Mitsubishi Electric's proprietary "Deep Clustering" method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers. A live speech separation demonstration that took place on May 24 in Tokyo, Japan, was widely covered by the Japanese media, with reports by three of the main Japanese TV stations and multiple articles in print and online newspapers. The technology is based on recent research by MERL's Speech and Audio team.
Links:
Mitsubishi Electric Corporation Press Release
MERL Deep Clustering Demo
Media Coverage:
Fuji TV, News, "Minna no Mirai" (Japanese)
The Nikkei (Japanese)
Nikkei Technology Online (Japanese)
Sankei Biz (Japanese)
EE Times Japan (Japanese)
ITpro (Japanese)
Nikkan Sports (Japanese)
Nikkan Kogyo Shimbun (Japanese)
Dempa Shimbun (Japanese)
Il Sole 24 Ore (Italian)
IEEE Spectrum (English).
-
- Date: March 5, 2017 - March 9, 2017
Where: New Orleans
MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
Brief - MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: Tuesday, December 13, 2016
Location: 2016 IEEE Spoken Language Technology Workshop, San Diego, California
Speaker: John Hershey, MERL
MERL Contact: Jonathan Le Roux
Research Areas: Machine Learning, Speech & Audio
Brief - MERL researcher John Hershey presents an invited tutorial at the 2016 IEEE Workshop on Spoken Language Technology, in San Diego, California. The topic, "developing novel deep neural network architectures from probabilistic models" stems from MERL work with collaborators Jonathan Le Roux and Shinji Watanabe, on a principled framework that seeks to improve our understanding of deep neural networks, and draws inspiration for new types of deep network from the arsenal of principles and tools developed over the years for conventional probabilistic models. The tutorial covers a range of parallel ideas in the literature that have formed a recent trend, as well as their application to speech and language.
-
- Date: Friday, October 21, 2016
Location: MIT, McGovern Institute for Brain Research, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2016, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Friday October 21, 2016 at MIT's Brain and Cognitive Sciences Department, at the McGovern Institute for Brain Research, in Cambridge, MA.
It is a follow-up to SANE 2012 (Mitsubishi Electric Research Labs - MERL), SANE 2013 (Columbia University), SANE 2014 (MIT CSAIL), and SANE 2015 (Google NY). Since the first edition, the audience has steadily grown, gathering 140 researchers and students in 2015.
SANE 2016 will feature invited talks by leading researchers: Juan P. Bello (NYU), William T. Freeman (MIT/Google), Nima Mesgarani (Columbia University), DAn Ellis (Google), Shinji Watanabe (MERL), Josh McDermott (MIT), and Jesse Engel (Google). It will also feature a lively poster session during lunch time, open to both students and researchers.
SANE 2016 is organized by Jonathan Le Roux (MERL), Josh McDermott (MIT), Jim Glass (MIT), and John R. Hershey (MERL).
-
- Date: September 8, 2016
Where: Interspeech 2016, San Francisco, CA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL Speech and Audio Team researchers Shinji Watanabe and Jonathan Le Roux presented two tutorials on September 8 at the Interspeech 2016 conference, held in San Francisco, CA. Shinji collaborated with Marc Delcroix (NTT Communication Science Laboratories, Japan) to deliver a three-hour lecture on "Recent Advances in Distant Speech Recognition", drawing upon their experience organizing and participating in six different recent robust speech processing challenges. Jonathan teamed with Emmanuel Vincent (Inria, France) and Hakan Erdogan (Sabanci University, Microsoft Research) to give an in-depth tour of the latest advances in "Learning-based Approaches to Speech Enhancement And Separation". This collaboration stemmed from extensive stays at MERL by Emmanuel and Hakan, Emmanuel as a summer visitor, and Hakan as a MERL visiting research scientist for over a year while on sabbatical.
Both tutorials were sold out, each attracting more than 100 researchers and students in related fields, and received high praise from audience members.
-
- Date: March 20, 2016 - March 25, 2016
Where: Shanghai, China
MERL Contacts: Petros T. Boufounos; Chiori Hori; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Anthony Vetro
Research Areas: Computational Sensing, Digital Video, Speech & Audio, Communications, Signal Processing
Brief - MERL researchers have presented 12 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which was held in Shanghai, China from March 20-25, 2016. ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing, with more than 1200 papers presented and over 2000 participants.
-
- Date: March 4, 2016
Where: Johns Hopkins Center for Language and Speech Processing
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL researcher and speech team leader, John Hershey, was invited by the Center for Language and Speech Processing at Johns Hopkins University to give a talk on MERL's breakthrough audio separation work, known as "Deep Clustering". The talk was entitled "Speech Separation by Deep Clustering: Towards Intelligent Audio Analysis and Understanding," and was given on March 4, 2016.
This is work conducted by MERL researchers John Hershey, Jonathan Le Roux, and Shinji Watanabe, and MERL interns, Zhuo Chen of Columbia University, and Yusef Isik of Sabanci University.
-
- Date: December 15, 2015
Awarded to: John R. Hershey, Takaaki Hori, Jonathan Le Roux and Shinji Watanabe
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The results of the third 'CHiME' Speech Separation and Recognition Challenge were publicly announced on December 15 at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) held in Scottsdale, Arizona, USA. MERL's Speech and Audio Team, in collaboration with SRI, ranked 2nd out of 26 teams from Europe, Asia and the US. The task this year was to recognize speech recorded using a tablet in real environments such as cafes, buses, or busy streets. Due to the high levels of noise and the distance from the speaker's mouth to the microphones, this is very challenging task, where the baseline system only achieved 33.4% word error rate. The MERL/SRI system featured state-of-the-art techniques including multi-channel front-end, noise-robust feature extraction, and deep learning for speech enhancement, acoustic modeling, and language modeling, leading to a dramatic 73% reduction in word error rate, down to 9.1%. The core of the system has since been released as a new official challenge baseline for the community to use.
-
- Date: Thursday, October 22, 2015
Location: Google, New York City, NY
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2015, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 22, 2015 at Google, in New York City, NY.
It is a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), SANE 2013, held at Columbia University, and SANE 2014, held at MIT, which each gathered 70 to 90 researchers and students.
SANE 2015 will feature invited talks by leading researchers from the Northeast, as well as from the international community: Rohit Prasad (Amazon), Michael Mandel (Brooklyn College, CUNY), Ron Weiss (Google), John Hershey (MERL), Pablo Sprechmann (NYU), Tuomas Virtanen (Tampere University of Technology), and Paris Smaragdis (UIUC). It will also feature a lively poster session during lunch time, open to both students and researchers.
SANE 2015 is organized by Jonathan Le Roux (MERL), Hank Liao (Google), Andrew Senior (Google), and John R. Hershey (MERL).
-
- Date: April 19, 2015 - April 24, 2015
Where: IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP)
MERL Contacts: Anthony Vetro; Hassan Mansour; Petros T. Boufounos; Jonathan Le Roux Brief - Multimedia Group researchers have presented 8 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing, which was held in Brisbane, Australia from April 19-24, 2015.
-
- Date: March 9, 2015
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Recent research on speech enhancement by MERL's Speech and Audio team was highlighted in "Cars That Think", IEEE Spectrum's blog on smart technologies for cars. IEEE Spectrum is the flagship publication of the Institute of Electrical and Electronics Engineers (IEEE), the world's largest association of technical professionals with more than 400,000 members.
-
- Date: February 17, 2015
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Mitsubishi Electric Corporation announced that it has developed breakthrough noise-suppression technology that significantly improves the quality of hands-free voice communication in noisy conditions, such as making a voice call via a car navigation system. Speech clarity is improved by removing 96% of surrounding sounds, including rapidly changing noise from turn signals or wipers, which are difficult to suppress using conventional methods. The technology is based on recent research on speech enhancement by MERL's Speech and Audio team. .
-
- Date: Thursday, October 23, 2014
Location: Mitsubishi Electric Research Laboratories (MERL)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2014, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 23, 2014 at MIT, in Cambridge, MA. It is a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), and SANE 2013, held at Columbia University, which each gathered around 70 researchers and students. SANE 2014 will feature invited talks by leading researchers from the Northeast as well as Europe: Najim Dehak (MIT), Hakan Erdogan (MERL/Sabanci University), Gael Richard (Telecom ParisTech), George Saon (IBM Research), Andrew Senior (Google Research), Stavros Tsakalidis (BBN - Raytheon), and David Wingate (Lyric). It will also feature a lively poster session during lunch time, open to both students and researchers. SANE 2014 is organized by Jonathan Le Roux (MERL), Jim Glass (MIT), and John R. Hershey (MERL).
-
- Date: March 11, 2014
Awarded to: Yuuki Tachioka
Awarded for: "Effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge"
Awarded by: Acoustical Society of Japan (ASJ)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MELCO researcher Yuuki Tachioka received the Awaya Prize Young Researcher Award from the Acoustical Society of Japan (ASJ) for "effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge", which was based on joint work with MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John R. Hershey.
-
- Date: January 1, 2014
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Jonathan Le Roux, Shinji Watanabe and John R. Hershey have been elected for 3-year terms to Technical Committees of the IEEE Signal Processing Society. Jonathan has been elected to the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP-TC), and Shinji and John to the Speech and Language Processing Technical Committee (SL-TC). Members of the Speech & Audio team now together hold four TC positions, as John also serves on the AASP-TC.
-
- Date: February 10, 2014
MERL Contacts: Jonathan Le Roux; Daniel N. Nikovski; Anthony Vetro Brief - Mitsubishi Electric Corporation demonstrated an ultra-simple HMI for in-car device operation using algorithms developed by MERL to predict user actions and destinations.
-