TR2005-078

Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams


    •  Xie, L., Kennedy, L., Chang, S.-F., Divakaran, A., Sun, H., Lin, C.-Y., "Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2005, vol. 2, pp. 1053-1056.
      BibTeX TR2005-078 PDF
      • @inproceedings{Xie2005mar,
      • author = {Xie, L. and Kennedy, L. and Chang, S.-F. and Divakaran, A. and Sun, H. and Lin, C.-Y.},
      • title = {Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2005,
      • volume = 2,
      • pages = {1053--1056},
      • month = mar,
      • issn = {1520-6149},
      • url = {https://www.merl.com/publications/TR2005-078}
      • }
  • MERL Contact:
Abstract:

We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audio-visual streams into mid-level labels, it also models the correlations in text with probabilistic latent semantic analysis. The upper layer fuses the statistical evidence across diverse modalities with a flexible meta-mixture model that assumes loose temporal correspondence. Evaluation on a large news database shows that multi-modal clusters have better correspondence to news topics than audio-visual clusters alone; novel analysis techniques suggest that meaningful clusters occur when the prediction of salient features by the model concurs with those shown in the story clusters.

 

  • Related News & Events

    •  NEWS    ICASSP 2005: 4 publications by Anthony Vetro, Ajay Divakaran, Huifang Sun and others
      Date: March 18, 2005
      Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
      MERL Contacts: Anthony Vetro; Huifang Sun
      Brief
      • The papers "Fast Adaptive Fuzzy Post-Filtering for Coding Artifacts Removal in Interlaced Video" by Nie, Y., Kong, H.-S., Vetro, A. and Barner, K., "Video Coding Using 3-D Dual-Tree Discrete Wavelet Transform" by Wang, B., Wang, Y., Selesnick, I. and Vetro, A., "A Companding Front End for Noise-Robust Automatic Speech Recognition" by Guinness, J., Raj, B., Schmidt-Nielsen, B., Turicchia, L. and Sarpeshkar, R. and "Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams" by Xie, L., Kennedy, L., Chang, S.-F., Divakaran, A., Sun, H. and Lin, C.-Y. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
    •