Generation or Replication: Auscultating Audio Latent Diffusion Models
Audio examples from ICASSP 2024 submission
MERL Researchers: Gordon Wichern, François Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux (Speech & Audio).
Search MERL publications by keyword: Speech & Audio, acoustic similarity, audio synthesis,
Identified partially replicated training examples from the full TANGO model.
For each generated example we show the top match found in the training set for both similarity methods explored in our paper - CLAP and mel. While the generated sounds are not identical to the training data, they have striking similarities in terms of features such as event onsets, which appear to be replicated from the training data.
Generated Sample Prompt: Something zooms by before exploding in the distance |
Top Training Data Match: mel Caption: Explosions occur multiple times |
Top Training Data Match: CLAP Caption: Multiple explosions |
Generated Sample Prompt: A man speaks and an audience gives applause |
Top Training Data Match: mel Caption: Person is speaking and people are cheering |
Top Training Data Match: CLAP Caption: Excitement and applause for a male speaker |
Generated Sample Prompt: A motor is accelerating and then slows, then accelerates again |
Top Training Data Match: mel Caption: A train moves then a horn is triggered and a bell rings |
Top Training Data Match: CLAP Caption: A racing vehicle engine revving up before accelerating and driving by |
Generated Sample Prompt: A person sneezing |
Top Training Data Match: mel Caption: A loud burp is made |
Top Training Data Match: CLAP Caption: A child sneezes |
Generated Sample Prompt: A person is snoring while sleeping |
Top Training Data Match: mel Caption: A person snoring |
Top Training Data Match: CLAP Caption: A person snores loudly nearby several times |
Generated Sample Prompt: A door opens then closes followed by thunder |
Top Training Data Match: mel Caption: Thunder sounds loudly nearby |
Top Training Data Match: CLAP Caption: A click followed by a loud, long bang |
Generated Sample Prompt: Silence followed by breathing, a sneeze then sniffling |
Top Training Data Match: mel Caption: A person sneezes loudly nearby |
Top Training Data Match: CLAP Caption: An adult female sneeze three times and sniffs |
Generated Sample Prompt: The gentle drone of a fan blows with an echo as a toilet flushes |
Top Training Data Match: mel Caption: A toilet is flushed |
Top Training Data Match: CLAP Caption: A toilet is flushed and the water gurgles loudly |
Generated Sample Prompt: A power tool is in use |
Top Training Data Match: mel Caption: High pitched drilling |
Top Training Data Match: CLAP Caption: Drill spinning rapidly and then getting stuck and stopping |
Generated Sample Prompt: A person snoring and breathing heavily |
Top Training Data Match: mel Caption: A person snoring |
Top Training Data Match: CLAP Caption: A person snoring |
Generated Sample Prompt: A man speaks, and then a toilet flushes, followed by the man continuing to speak |
Top Training Data Match: mel Caption: A winged insect is buzzing around |
Top Training Data Match: CLAP Caption: A man speaking followed by a toilet flushing |
Generated Sample Prompt: A man speaking followed by metal rattling then a motorcycle engine starting up and running idle |
Top Training Data Match: mel Caption: Birds chirp and something squeaks while leaves rustle |
Top Training Data Match: CLAP Caption: A man talks followed by a motorcycle engine starting |
Generated Sample Prompt: Silence followed by a man speaking and then a toilet flushing |
Top Training Data Match: mel Caption: A shifting sound accompanies a knock, followed by a toilet flushing |
Top Training Data Match: CLAP Caption: A man speaking followed by a toilet flushing |
Generated Sample Prompt: A heavy rain falls |
Top Training Data Match: mel Caption: Flushing of a toilet as bells ring |
Top Training Data Match: CLAP Caption: Thunder clap and rain |
Generated Sample Prompt: Two snaps occur |
Top Training Data Match: mel Caption: A machine runs and then a loud burst of air pops |
Top Training Data Match: CLAP Caption: A small gunshot rings |
Identified duplicates in AudioCaps training set
Full list [audiocaps_duplicates.csv]Selected examples - cluster 0
Selected examples - cluster 2
Selected examples - cluster 53
Selected examples - cluster 54
MERL Publications
- "Generation or Replication: Auscultating Audio Latent Diffusion Models", arXiv, October 2023.BibTeX arXiv
- @article{Bralios2023oct,
- author = {Bralios, Dimitrios and Wichern, Gordon and Germain, François G and Pan, Zexu and Khurana, Sameer and Hori, Chiori and Le Roux, Jonathan},
- title = {Generation or Replication: Auscultating Audio Latent Diffusion Models},
- journal = {arXiv},
- year = 2023,
- month = oct,
- url = {https://arxiv.org/abs/2310.10604}
- }
,