LocalVQE: real-time AEC + noise suppression + dereverb

LocalVQE is a ~1 M-parameter open-source model that cleans up a microphone signal on a voice call: it cancels the remote participant's voice being picked up again (echo), suppresses background noise, and removes reverberation — all in a single causal pass on CPU.

Provide two inputs:

  • Mic: the raw microphone recording (what the far end would hear without any processing).
  • Far-end reference: the audio being played out of your speakers. For a pure noise-suppression test (no speaker playback), upload silence or leave empty.

Try the bundled examples first — they cover heavy and light near-end noise (NE-ST mixed with DNS5 background at 5 dB and 20 dB SNR), a clean far-end single-talk clip, a far-end clip with some near-end overlap (mislabelled in the source corpus, but a useful test of AEC + near-end preservation together), and a double-talk clip — all from the ICASSP 2022 AEC Challenge blind set.

Weights: LocalAI-io/LocalVQE · Code: github.com/localai-org/LocalVQE

Post-process the enhanced output: silence any 10 ms frame whose RMS falls below the threshold. Cleans up the quiet residual you'd hear during far-end-only stretches; will also mute genuinely quiet speech below the threshold.

-70 -20
Examples — top to bottom: near-end + heavy noise (5 dB SNR, pure NS), near-end + light noise (20 dB SNR, NS preserving clean speech), far-end single-talk (pure AEC), far-end with brief near-end overlap (AEC while preserving NE), and double-talk (AEC while near-end is also talking).
Mic (microphone recording) Far-end reference (speaker playback)

Loaded: hf:LocalAI-io/LocalVQE/localvqe-v1-1.3M.pt · sha256 499d7cadfe939c2f… · 1,290,453 params