FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
Abstract
FocalCodec-Stream is a hybrid neural audio codec that achieves high-quality speech compression with low latency and is suitable for real-time applications.
Neural audio codecs are a fundamental component of modern generative audio pipelines. Although recent codecs achieve strong low-bitrate reconstruction and provide powerful representations for downstream tasks, most are non-streamable, limiting their use in real-time applications. We present FocalCodec-Stream, a hybrid codec based on focal modulation that compresses speech into a single binary codebook at 0.55 - 0.80 kbps with a theoretical latency of 80 ms. Our approach combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module that enhances quality under latency constraints. Experiments show that FocalCodec-Stream outperforms existing streamable codecs at comparable bitrates, while preserving both semantic and acoustic information. The result is a favorable trade-off between reconstruction quality, downstream task performance, latency, and efficiency. Code and checkpoints will be released at https://github.com/lucadellalib/focalcodec.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference (2025)
- MSR-Codec: A Low-Bitrate Multi-Stream Residual Codec for High-Fidelity Speech Generation with Information Disentanglement (2025)
- A High-Quality and Low-Complexity Streamable Neural Speech Codec with Knowledge Distillation (2025)
- SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec (2025)
- TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling (2025)
- FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs (2025)
- DarkStream: real-time speech anonymization with low latency (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 6
Browse 6 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper