TR2005-137

Latent Variable Decomposition of Spectrograms for Single Channel Speaker Separation


Abstract:

In this paper we present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decompoistion of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The districution of the process is modelled a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learnt from taining signals for each speaker. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution, i.e. the spectrum for each speaker is reconstructed from this decomposition. Experimental results show that the proposed method is very effective at separating mixed signals.

 

  • Related News & Events

    •  NEWS    WASPAA 2005: 3 publications by Petros T. Boufounos, Ajay Divakaran and Paris Smaragdis
      Date: October 16, 2005
      Where: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
      MERL Contact: Petros T. Boufounos
      Brief
      • The papers "Latent Variable Decomposition of Spectrograms for Single Channel Speaker Separation" by Raj, B. and Smaragdis, P., "Learning Source Trajectories Using Wrapped-Phase Hidden Markov Models" by Smaragdis, P. and Boufounos, P. and "Audio Analysis for Surveillance Applications" by Radhakrishnan, R., Divakaran, A. and Smaragdis, P. were presented at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
    •