TR2025-031

O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization


    •  Gruttadauria, E., Fontaine, M., Le Roux, J., Essid, S., "O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.
      BibTeX TR2025-031 PDF
      • @inproceedings{Gruttadauria2025mar,
      • author = {Gruttadauria, Elio and Fontaine, Mathieu and {Le Roux}, Jonathan and Essid, Slim},
      • title = {{O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization}},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2025,
      • month = mar,
      • url = {https://www.merl.com/publications/TR2025-031}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clus- tering approaches, and a more efficient alternative to current online end- to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.