Audio Processing - 2024-08

Publish Date Title Authors PDF Translate Read Code
2024-08-30 Advancing Multi-talker ASR Performance with Large Language Models Mohan Shi et.al. 2408.17431 translate read null
2024-08-30 AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge Kirill Borodin et.al. 2408.17352 translate read null
2024-08-30 Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Zhen Ye et.al. 2408.17175 translate read link
2024-08-30 Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings Shota Horiguchi et.al. 2408.17142 translate read null
2024-08-30 Generative Modeling Perspective for Control and Reasoning in Robotics Takuma Yoneda et.al. 2408.17041 translate read null
2024-08-30 Utilizing Speaker Profiles for Impersonation Audio Detection Hao Gu et.al. 2408.17009 translate read null
2024-08-30 Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Zhifei Xie et.al. 2408.16725 translate read link
2024-08-29 CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions Laurin Wagner et.al. 2408.16589 translate read link
2024-08-29 Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing Qianhui Liu et.al. 2408.16564 translate read null
2024-08-29 RAVE for Speech: Efficient Voice Conversion at High Sampling Rates Anders R. Bargum et.al. 2408.16546 translate read null
2024-08-29 Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis Zehai Tu et.al. 2408.16373 translate read null
2024-08-29 Measuring the Accuracy of Automatic Speech Recognition Solutions Korbinian Kuhn et.al. 2408.16287 translate read link
2024-08-29 Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation Lun Wang et.al. 2408.16204 translate read null
2024-08-29 Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction Yuka Ko et.al. 2408.16180 translate read null
2024-08-28 Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group’s Approach for ASVspoof5 Challenge Oğuzhan Kurnaz et.al. 2408.15877 translate read null
2024-08-28 VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling Yixuan Zhou et.al. 2408.15676 translate read link
2024-08-28 Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications Korbinian Kuhn et.al. 2408.15616 translate read link
2024-08-28 Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models Yiyang Zhao et.al. 2408.15585 translate read null
2024-08-28 EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models Wenhan Yao et.al. 2408.15508 translate read null
2024-08-27 Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement Longshen Ou et.al. 2408.15176 translate read null
2024-08-27 Speech Recognition Transformers: Topological-lingualism Perspective Shruti Singh et.al. 2408.14991 translate read null
2024-08-27 Literary and Colloquial Dialect Identification for Tamil using Acoustic Features M. Nanmalar et.al. 2408.14887 translate read null
2024-08-27 The VoxCeleb Speaker Recognition Challenge: A Retrospective Jaesung Huh et.al. 2408.14886 translate read null
2024-08-27 MaskCycleGAN-based Whisper to Normal Speech Conversion K. Rohith Gupta et.al. 2408.14797 translate read null
2024-08-26 MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues Kuluhan Binici et.al. 2408.14418 translate read null
2024-08-26 Self-supervised Speech Representations Still Struggle with African American Vernacular English Kalvin Chang et.al. 2408.14262 translate read link
2024-08-26 Automatic recognition and detection of aphasic natural speech Mara Barberis et.al. 2408.14082 translate read null
2024-08-26 Research Advances and New Paradigms for Biology-inspired Spiking Neural Networks Tianyu Zheng et.al. 2408.13996 translate read null
2024-08-26 Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard Wonjune Kang et.al. 2408.13970 translate read null
2024-08-25 Literary and Colloquial Tamil Dialect Identification M. Nanmalar et.al. 2408.13739 translate read null
2024-08-24 Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification Aditya Dawn et.al. 2408.13644 translate read null
2024-08-24 As Biased as You Measure: Methodological Pitfalls of Bias Evaluations in Speaker Verification Research Wiebke Hutiri et.al. 2408.13614 translate read null
2024-08-24 SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description Zeyu Jin et.al. 2408.13608 translate read link
2024-08-23 Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples Zhenyu Wang et.al. 2408.13341 translate read null
2024-08-23 Which Prosodic Features Matter Most for Pragmatics? Nigel G. Ward et.al. 2408.13240 translate read null
2024-08-23 NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks He Huang et.al. 2408.13106 translate read null
2024-08-23 Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models Adnan Haider et.al. 2408.13008 translate read null
2024-08-22 Towards measuring fairness in speech recognition: Fair-Speech dataset Irina-Elena Veliche et.al. 2408.12734 translate read null
2024-08-22 WhisperMask: A Noise Suppressive Mask-Type Microphone for Whisper Speech Hirotaka Hiraki et.al. 2408.12500 translate read null
2024-08-22 Positional Description for Numerical Normalization Deepanshu Gupta et.al. 2408.12430 translate read null
2024-08-22 LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation Shihao Chen et.al. 2408.12354 translate read null
2024-08-22 Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features Shaoxiang Dang et.al. 2408.12279 translate read null
2024-08-21 The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al Nicolad Garneau et.al. 2408.11940 translate read null
2024-08-21 Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis et.al. 2408.11804 translate read link
2024-08-22 A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification Xujiang Xing et.al. 2408.11562 translate read null
2024-08-21 Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech Anastasia Avdeeva et.al. 2408.11528 translate read null
2024-08-21 Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers Prashant Serai et.al. 2408.11258 translate read null
2024-08-20 BUT Systems and Analyses for the ASVspoof 5 Challenge Johan Rohdin et.al. 2408.11152 translate read null
2024-08-20 AI-Based IVR Gassyrbek Kosherbay et.al. 2408.10549 translate read null
2024-08-20 XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition Xucheng Wan et.al. 2408.10524 translate read null
2024-08-19 ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge Juan M. Martín-Doñas et.al. 2408.10361 translate read null
2024-08-19 Hear Your Face: Face-based voice conversion with F0 estimation Jaejun Lee et.al. 2408.09802 translate read null
2024-08-19 Unsupervised Composable Representations for Audio Giovanni Bindi et.al. 2408.09792 translate read null
2024-08-19 Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts Jiaqing Liu et.al. 2408.09688 translate read null
2024-08-18 A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition Yangze Li et.al. 2408.09491 translate read null
2024-08-17 Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model Massimiliano Todisco et.al. 2408.09300 translate read null
2024-08-17 Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition Samuele Cornell et.al. 2408.09215 translate read null
2024-08-14 Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics Thomas Thebaud et.al. 2408.08918 translate read null
2024-08-16 ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale Xin Wang et.al. 2408.08739 translate read null
2024-08-15 Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words Kento Nozawa et.al. 2408.08027 translate read null
2024-08-14 SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition Mohamed Osman et.al. 2408.07851 translate read link
2024-08-14 WavLM model ensemble for audio deepfake detection David Combei et.al. 2408.07414 translate read null
2024-08-14 DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement Tao Sun et.al. 2408.07388 translate read null
2024-08-13 Play Me Something Icy: Practical Challenges, Explainability and the Semantic Gap in Generative AI Music Jesse Allison et.al. 2408.07224 translate read null
2024-08-13 VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders Yubing Cao et.al. 2408.06906 translate read null
2024-08-13 SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis Osamu Take et.al. 2408.06858 translate read link
2024-08-13 PRESENT: Zero-Shot Text-to-Prosody Control Perry Lam et.al. 2408.06827 translate read link
2024-08-13 Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation Matthias Bartolo et.al. 2408.06804 translate read link
2024-08-12 Cross-Lingual Conversational Speech Summarization with Large Language Models Max Nelson et.al. 2408.06484 translate read null
2024-08-12 Audio Enhancement for Computer Audition – An Iterative Training Paradigm Using Sample Importance Manuel Milling et.al. 2408.06264 translate read null
2024-08-12 Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning Wonjun Lee et.al. 2408.06043 translate read null
2024-08-12 Controlling Surprisal in Music Generation via Information Content Curve Matching Mathias Rose Bjare et.al. 2408.06022 translate read link
2024-08-11 LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition Eunseop Yoon et.al. 2408.05769 translate read null
2024-08-11 VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing Chunyu Qiang et.al. 2408.05758 translate read null
2024-08-10 Improving Whisper’s Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text Jinpeng Li et.al. 2408.05554 translate read null
2024-08-09 MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Junhao Xu et.al. 2408.05101 translate read null
2024-08-09 TEAdapter: Supply abundant guidance for controllable text-to-music generation Jialing Zou et.al. 2408.04865 translate read null
2024-08-08 MulliVC: Multi-lingual Voice Conversion With Cycle Consistency Jiawei Huang et.al. 2408.04708 translate read null
2024-08-08 NeuralMultiling: A Novel Neural Architecture Search for Smartphone based Multilingual Speaker Verification Aravinda Reddy PN et.al. 2408.04362 translate read null
2024-08-08 HydraFormer: One Encoder For All Subsampling Rates Yaoxun Xu et.al. 2408.04325 translate read link
2024-08-08 Preserving spoken content in voice anonymisation with character-level vocoder conditioning Michele Panariello et.al. 2408.04306 translate read null
2024-08-08 wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech Khai Le-Duc et.al. 2408.04174 translate read null
2024-08-07 Speaker Adaptation for Quantised End-to-End ASR Models Qiuming Zhao et.al. 2408.03979 translate read null
2024-08-06 Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training Hawraz A. Ahmad et.al. 2408.03887 translate read null
2024-08-07 Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation Karn N. Watcharasupat et.al. 2408.03588 translate read null
2024-08-06 ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval Ruixiang Zhao et.al. 2408.02978 translate read null
2024-08-06 Self-Supervised Learning for Multi-Channel Neural Transducer Atsushi Kojima et.al. 2408.02945 translate read null
2024-08-05 Automatic Voice Identification after Speech Resynthesis using PPG Thibault Gaudier et.al. 2408.02712 translate read null
2024-08-05 Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition Jaeyoung Kim et.al. 2408.02582 translate read null
2024-08-05 The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024 He Wang et.al. 2408.02369 translate read null
2024-08-05 StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion Zhichao Wang et.al. 2408.02178 translate read null
2024-08-04 Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of Never-used Notes through a Joint Probabilistic Diffusion Model Shipei Liu et.al. 2408.01950 translate read null
2024-08-03 ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features Peng Cheng et.al. 2408.01808 translate read null
2024-08-03 Generating High-quality Symbolic Music Using Fine-grained Discriminators Zhedong Zhang et.al. 2408.01696 translate read null
2024-08-02 EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody Coen Schoof et.al. 2408.01178 translate read null
2024-08-01 Expressive MIDI-format Piano Performance Generation Jingwei Liu et.al. 2408.00900 translate read null
2024-08-01 SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data Yichen Lu et.al. 2408.00624 translate read null
2024-08-01 Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation Xinhan Di et.al. 2408.00284 translate read null
2024-08-01 Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation Kohei Matsuura et.al. 2408.00205 translate read null
2024-08-01 Generative Expressive Conversational Speech Synthesis Rui Liu et.al. 2407.21491 translate read null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)