Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? Paper • 2402.12025 • Published Feb 19, 2024 • 2
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection Paper • 2406.06097 • Published Jun 10, 2024 • 2
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation Paper • 2406.14177 • Published Jun 20, 2024 • 1
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper • 2410.01036 • Published Oct 1, 2024 • 16
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study Paper • 2410.00545 • Published Oct 1, 2024 • 5
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? Paper • 2412.18495 • Published Dec 24, 2024 • 9
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not Paper • 2409.17044 • Published Sep 25, 2024 • 3
NUTSHELL: A Dataset for Abstract Generation from Scientific Talks Paper • 2502.16942 • Published Feb 24 • 1
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks Paper • 2507.19634 • Published Jul 25 • 9
Cross-Attention is Half Explanation in Speech-to-Text Models Paper • 2509.18010 • Published 9 days ago • 5
Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation Paper • 2509.17349 • Published 9 days ago • 2
KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 Paper • 2505.13036 • Published May 19
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations Paper • 2411.17666 • Published Nov 26, 2024
Early-Exit and Instant Confidence Translation Quality Estimation Paper • 2502.14429 • Published Feb 20 • 4
Are Generative Models Underconfident? An Embarrassingly Simple Quality Estimation Approach Paper • 2502.11115 • Published Feb 16
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models Paper • 2411.18152 • Published Nov 27, 2024 • 1
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation Paper • 2405.04327 • Published May 7, 2024
Audio-driven Talking Face Generation with Stabilized Synchronization Loss Paper • 2307.09368 • Published Jul 18, 2023
Contrastive Learning for Task-Independent SpeechLLM-Pretraining Paper • 2412.15712 • Published Dec 20, 2024
Quality Estimation with k-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation Paper • 2404.18031 • Published Apr 27, 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 Paper • 2406.16777 • Published Jun 24, 2024
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach Paper • 2409.09009 • Published Sep 13, 2024
COMET-poly: Machine Translation Metric Grounded in Other Candidates Paper • 2508.18549 • Published Aug 25
PIER: A Novel Metric for Evaluating What Matters in Code-Switching Paper • 2501.09512 • Published Jan 16
Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages Paper • 2408.02290 • Published Aug 5, 2024
Continuously Learning New Words in Automatic Speech Recognition Paper • 2401.04482 • Published Jan 9, 2024
Weight Factorization and Centralization for Continual Learning in Speech Recognition Paper • 2506.16574 • Published Jun 19
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion Paper • 2506.04013 • Published Jun 4
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement Paper • 2506.16580 • Published Jun 19