Audio Processing - 2026-03

Publish Date Title Authors PDF Translate Read Code
2026-03-31 FLEURS-Kobani: Extending the FLEURS Dataset for Northern Kurdish Daban Q. Jaff et.al. 2603.29892 translate read null
2026-03-31 LLM Probe: Evaluating LLMs for Low-Resource Languages Hailay Kidu Teklehaymanot et.al. 2603.29517 translate read null
2026-03-31 Spoken Digit Recognition and Speaker Classification by Nonlinear Interfered Spin Wave-Based Physical Reservoir Computing Sota Hikasa et.al. 2603.29311 translate read null
2026-03-31 Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition Lukuang Dong et.al. 2603.29217 translate read null
2026-03-31 From Natural Alignment to Conditional Controllability in Multimodal Dialogue Zeyu Jin et.al. 2603.29162 translate read null
2026-03-30 EBuddy: a workflow orchestrator for industrial human-machine collaboration Michele Banfi et.al. 2603.28579 translate read null
2026-03-30 Voice-Controlled Scratch for Children with (Motor) Disabilities Elias Goller et.al. 2603.28246 translate read null
2026-03-30 Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models Luigi Curini et.al. 2603.28103 translate read null
2026-03-30 On the Role of Encoder Depth: Pruning Whisper and LoRA Fine-Tuning in SLAM-ASR Ganesh Pavan Kartikeya Bharadwaj Kolluri et.al. 2603.27981 translate read null
2026-03-25 POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan Marta Moscati et.al. 2603.24569 translate read null
2026-03-25 A Sociolinguistic Analysis of Automatic Speech Recognition Bias in Newcastle English Dana Serditova et.al. 2603.24549 translate read null
2026-03-25 What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification Massa Baali et.al. 2603.24432 translate read null
2026-03-25 When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools Xingming Li et.al. 2603.24389 translate read null
2026-03-25 Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing Rinku Sebastian et.al. 2603.24283 translate read null
2026-03-25 How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools for Romanian Teodora Răgman et.al. 2603.24116 translate read null
2026-03-25 From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs Xiaoyong Guo et.al. 2603.24034 translate read null
2026-03-24 Echoes: A semantically-aligned music deepfake detection dataset Octavian Pascu et.al. 2603.23667 translate read null
2026-03-24 Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages Badr M. Abdullah et.al. 2603.23654 translate read null
2026-03-24 Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework Zeinab Dehghani et.al. 2603.23625 translate read null
2026-03-24 MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates Zikang Huang et.al. 2603.23048 translate read null
2026-03-24 When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse Yihuan Huang et.al. 2603.22915 translate read null
2026-03-24 Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics Naohiro Tawara et.al. 2603.22709 translate read null
2026-03-24 MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation Di Zhu et.al. 2603.22677 translate read null
2026-03-23 Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks Matías Pizarro et.al. 2603.22590 translate read null
2026-03-23 SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation Lucas H. Ueda et.al. 2603.22252 translate read null
2026-03-23 SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding Haroun Elleuch et.al. 2603.21940 translate read null
2026-03-23 Ara-Best-RQ: Multi Dialectal Arabic SSL Haroun Elleuch et.al. 2603.21900 translate read null
2026-03-23 Cascade-Free Mandarin Visual Speech Recognition via Semantic-Guided Cross-Representation Alignment Lei Yang et.al. 2603.21808 translate read null
2026-03-23 RESPOND: Responsive Engagement Strategy for Predictive Orchestration and Dialogue Meng-Chen Lee et.al. 2603.21682 translate read null
2026-03-22 HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit Khushiyant et.al. 2603.21316 translate read null
2026-03-22 Fusing Memory and Attention: A study on LSTM, Transformer and Hybrid Architectures for Symbolic Music Generation Soudeep Ghoshal et.al. 2603.21282 translate read null
2026-03-22 SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing Jianyi Chen et.al. 2603.21073 translate read null
2026-03-20 Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio Candice R. Gerstner et.al. 2603.20165 translate read null
2026-03-20 Demonstration of Adapt4Me: An Uncertainty-Aware Authoring Environment for Personalizing Automatic Speech Recognition to Non-normative Speech Niclas Pokel et.al. 2603.20112 translate read null
2026-03-20 LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families Jianan Chen et.al. 2603.20042 translate read null
2026-03-20 Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech? Lokesh Kumar et.al. 2603.19831 translate read null
2026-03-20 Borderless Long Speech Synthesis Xingchen Song et.al. 2603.19798 translate read null
2026-03-19 Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction Anh-Tuan Dao et.al. 2603.18657 translate read null
2026-03-18 Impact of automatic speech recognition quality on Alzheimer’s disease detection from spontaneous speech: a reproducible benchmark study with lexical modeling and statistical validation Himadri Samanta et.al. 2603.18239 translate read null
2026-03-18 Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition Yuxiang Mei et.al. 2603.17558 translate read null
2026-03-17 Over-the-air White-box Attack on the Wav2Vec Speech Recognition Neural Network Protopopov Alexey et.al. 2603.16972 translate read null
2026-03-17 On the Emotion Understanding of Synthesized Speech Yuan Ge et.al. 2603.16483 translate read null
2026-03-17 RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery Abhishek Kumar et.al. 2603.16411 translate read null
2026-03-17 Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus Martina Simonotti et.al. 2603.16258 translate read null
2026-03-17 Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR Quy-Anh Dang et.al. 2603.16184 translate read null
2026-03-16 Lost in Transcription: Subtitle Errors in Automatic Speech Recognition Reduce Speaker and Content Evaluations Kowe Kadoma et.al. 2603.15807 translate read null
2026-03-16 SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia Pengfei Yue et.al. 2603.15409 translate read null
2026-03-16 Tagarela - A Portuguese speech dataset from podcasts Frederico Santos de Oliveira et.al. 2603.15326 translate read null
2026-03-16 Two-Stage Adaptation for Non-Normative Speech Recognition: Revisiting Speaker-Independent Initialization for Personalization Shan Jiang et.al. 2603.15261 translate read null
2026-03-16 LLMs and Speech: Integration vs. Combination Robin Schmitt et.al. 2603.15045 translate read null
2026-03-16 PhonemeDF: A Synthetic Speech Dataset for Audio Deepfake Detection and Naturalness Evaluation Vamshi Nallaguntla et.al. 2603.15037 translate read null
2026-03-16 Vietnamese Automatic Speech Recognition: A Revisit Thi Vu et.al. 2603.14779 translate read null
2026-03-16 Investigating the Impact of Speech Enhancement on Audio Deepfake Detection in Noisy Environments Anacin et.al. 2603.14767 translate read null
2026-03-15 Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations Deok-Hyeon Cho et.al. 2603.14432 translate read null
2026-03-15 CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents Wen-Chin Huang et.al. 2603.14328 translate read null
2026-03-12 Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition Umberto Cappellazzo et.al. 2603.12046 translate read null
2026-03-12 ReDimNet2: Scaling Speaker Verification via Time-Pooled Dimension Reshaping Ivan Yakovlev et.al. 2603.11841 translate read null
2026-03-12 Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2 Suvendu Sekhar Mohanty et.al. 2603.11683 translate read null
2026-03-12 RAF: Relativistic Adversarial Feedback For Universal Speech Synthesis Yongjoon Lee et.al. 2603.11678 translate read null
2026-03-11 Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data Hillary Mutisya et.al. 2603.11378 translate read null
2026-03-11 Duration Aware Scheduling for ASR Serving Under Workload Drift Darshan Makwana et.al. 2603.11273 translate read null
2026-03-11 Huntington Disease Automatic Speech Recognition with Biomarker Supervision Charles L. Wang et.al. 2603.11168 translate read null
2026-03-11 Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition Yinfeng Xia et.al. 2603.11123 translate read null
2026-03-11 V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation Yan-Bo Lin et.al. 2603.11042 translate read null
2026-03-11 Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation Thomas Thebaud et.al. 2603.10827 translate read null
2026-03-11 Probabilistic Verification of Voice Anti-Spoofing Models Evgeny Kushnir et.al. 2603.10713 translate read null
2026-03-11 AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow Duojia Li et.al. 2603.10701 translate read null
2026-03-11 Distilling LLM Semantic Priors into Encoder-Only Multi-Talker ASR with Talker-Count Routing Hao Shi et.al. 2603.10587 translate read null
2026-03-11 FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System Kaituo Xu et.al. 2603.10420 translate read null
2026-03-11 NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction Jun Rekimoto et.al. 2603.10324 translate read null
2026-03-10 SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases Laya Iyer et.al. 2603.09853 translate read null
2026-03-10 A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition Dimme de Groot et.al. 2603.09725 translate read null
2026-03-10 Emotion-Aware Prefix: Towards Explicit Emotion Control in Voice Conversion Models Haoyuan Yang et.al. 2603.09120 translate read null
2026-03-10 Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition Jordan Prescott et.al. 2603.09034 translate read null
2026-03-09 Universal Speech Content Factorization Henry Li Xinyuan et.al. 2603.08977 translate read null
2026-03-09 NLE: Non-autoregressive LLM-based ASR by Transcript Editing Avihu Dekel et.al. 2603.08397 translate read null
2026-03-09 Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data Pol Buitrago et.al. 2603.08249 translate read null
2026-03-09 Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks Pol Buitrago et.al. 2603.08231 translate read null
2026-03-09 Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS Rania Al-Sabbagh et.al. 2603.08125 translate read null
2026-03-09 Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge Ze Li et.al. 2603.08092 translate read null
2026-03-09 Designing a Generative AI-Assisted Music Psychotherapy Tool for Deaf and Hard-of-Hearing Individuals Youjin Choi et.al. 2603.07963 translate read null
2026-03-08 Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR Rishikesh Kumar Sharma et.al. 2603.07554 translate read null
2026-03-08 Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech Tajamul Ashraf et.al. 2603.07513 translate read null
2026-03-07 Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning Wenjie Tian et.al. 2603.07263 translate read null
2026-03-07 The Talking Robot: Distortion-Robust Acoustic Models for Robot-Robot Communication Hanlong Li et.al. 2603.07072 translate read null
2026-03-06 Speak in Context: Multilingual ASR with Speech Context Alignment via Contrastive Learning Yuchen Zhang et.al. 2603.06505 translate read null
2026-03-06 Continual Adaptation for Pacific Indigenous Speech Recognition Yang Xiao et.al. 2603.06310 translate read null
2026-03-06 Whisper-CD: Accurate Long-Form Speech Recognition using Multi-Negative Contrastive Decoding Hoseong Ahn et.al. 2603.06193 translate read null
2026-03-06 Is it Me? Toward Self-Extension to AI Avatars in Virtual Reality Jieying Zhang et.al. 2603.06030 translate read null
2026-03-06 How Well Do Current Speech Deepfake Detection Methods Generalize to the Real World? Daixian Li et.al. 2603.05852 translate read null
2026-03-06 Which Data Matter? Embedding-Based Data Selection for Speech Recognition Zakaria Aldeneh et.al. 2603.05819 translate read null
2026-03-06 Activation Steering for Accent Adaptation in Speech Foundation Models Jinuo Sun et.al. 2603.05813 translate read null
2026-03-05 Koopman Regularized Deep Speech Disentanglement for Speaker Verification Nikos Chazaridis et.al. 2603.05577 translate read null
2026-03-05 Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection Junchuan Zhao et.al. 2603.05373 translate read null
2026-03-05 PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration Mohammad Javad Ranjbar Kalahroodi et.al. 2603.05314 translate read null
2026-03-05 Visual-Informed Speech Enhancement Using Attention-Based Beamforming Chihyun Liu et.al. 2603.05270 translate read null
2026-03-05 Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography Ting-Hui Cheng et.al. 2603.05267 translate read null
2026-03-05 Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards Linghan Fang et.al. 2603.05231 translate read null
2026-03-05 Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition Mengze Hong et.al. 2603.04945 translate read null
2026-03-05 Spectral dynamics reservoir computing for high-speed hardware-efficient neuromorphic processing Jiaxuan Chen et.al. 2603.04901 translate read null
2026-03-05 WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech Aurchi Chowdhury et.al. 2603.04809 translate read null
2026-03-05 When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper Akif Islam et.al. 2603.04710 translate read null
2026-03-04 ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis Youngwon Choi et.al. 2603.04219 translate read null
2026-03-04 Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement Fei Su et.al. 2603.03811 translate read null
2026-03-03 An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization Epshita Jahan et.al. 2603.03158 translate read null
2026-03-03 Speech recognition assisted by large language models to command software orally – Application to an augmented and virtual reality web app for immersive molecular graphics Fabio Cortes Rodriguez et.al. 2603.02901 translate read null
2026-03-03 SilentWear: an Ultra-Low Power Wearable System for EMG-based Silent Speech Recognition Giusy Spacone et.al. 2603.02847 translate read null
2026-03-03 Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge Dhanya E et.al. 2603.02813 translate read null
2026-03-02 ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models Xiaoyu Yi et.al. 2603.01984 translate read null
2026-03-02 VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications Loan Do et.al. 2603.01894 translate read null
2026-03-02 More Data, Fewer Diacritics: Scaling Arabic TTS Ahmed Musleh et.al. 2603.01622 translate read null
2026-03-02 The USTC-NERCSLIP Systems for the CHiME-9 MCoRec Challenge Ya Jiang et.al. 2603.01415 translate read null
2026-03-02 End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation Minghui Wu et.al. 2603.01382 translate read null
2026-03-02 DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement Minghui Wu et.al. 2603.01369 translate read null
2026-03-01 VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling Yanir Marmor et.al. 2603.01270 translate read null
2026-03-01 SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation Hongrui Wang et.al. 2603.01101 translate read null
2026-03-01 Using Songs to Improve Kazakh Automatic Speech Recognition Rustem Yeshpanov et.al. 2603.00961 translate read null
2026-03-01 Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages Kaushal Santosh Bhogale et.al. 2603.00941 translate read null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)