Audio Processing - 2024-04

Publish Date Title Authors PDF Translate Read Code
2024-04-30 Who is Authentic Speaker Qiang Huang et.al. 2405.00248 translate read null
2024-04-30 ConFides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration Sunwoo Ha et.al. 2405.00223 translate read null
2024-04-30 Expressivity and Speech Synthesis Andreas Triantafyllopoulos et.al. 2404.19363 translate read null
2024-04-30 Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation Eyal Liron Dolev et.al. 2404.19310 translate read null
2024-04-30 EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization Jianzong Wang et.al. 2404.19214 translate read null
2024-04-30 EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning Ziqi Liang et.al. 2404.19212 translate read null
2024-04-29 Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification Artem Abzaliev et.al. 2404.18739 translate read null
2024-04-29 MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis Xiang Li et.al. 2404.18398 translate read link
2024-04-30 ComposerX: Multi-Agent Symbolic Music Composition with LLMs Qixin Deng et.al. 2404.18081 translate read link
2024-04-27 A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness Oubaida Chouchane et.al. 2404.17810 translate read null
2024-04-26 An RFP dataset for Real, Fake, and Partially fake audio detection Abdulazeez AlAli et.al. 2404.17721 translate read null
2024-04-26 A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification Rémi Uro et.al. 2404.17552 translate read null
2024-04-26 Child Speech Recognition in Human-Robot Interaction: Problem Solved? Ruben Janssens et.al. 2404.17394 translate read null
2024-04-26 Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks Mingrui He et.al. 2404.17280 translate read null
2024-04-29 COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations Ruben Ciranni et.al. 2404.16969 translate read null
2024-04-26 Automatic Speech Recognition System-Independent Word Error Rate Estimation Chanho Park et.al. 2404.16743 translate read null
2024-04-25 Developing Acoustic Models for Automatic Speech Recognition in Swedish Giampiero Salvi et.al. 2404.16547 translate read null
2024-04-25 U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF Xingchen Song et.al. 2404.16407 translate read null
2024-04-24 Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Badri Narayana Patro et.al. 2404.16112 translate read link
2024-04-24 Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning Zuheng Kang et.al. 2404.15704 translate read null
2024-04-24 HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts Xinlei Niu et.al. 2404.15637 translate read null
2024-04-23 Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information Chihiro Taguchi et.al. 2404.15501 translate read link
2024-04-23 Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations Theo Lepage et.al. 2404.14913 translate read null
2024-04-23 Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance Tsubasa Ochiai et.al. 2404.14860 translate read null
2024-04-25 FlashSpeech: Efficient Zero-Shot Speech Synthesis Zhen Ye et.al. 2404.14700 translate read null
2024-04-22 Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal Assistants Nina Tran et.al. 2404.14605 translate read null
2024-04-22 Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks Alexandre Bittar et.al. 2404.14024 translate read null
2024-04-23 Retrieval-Augmented Audio Deepfake Detection Zuheng Kang et.al. 2404.13892 translate read null
2024-04-23 Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Charith Chandra Sai Balne et.al. 2404.13506 translate read null
2024-04-20 Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan Zeinali Hossein et.al. 2404.13428 translate read null
2024-04-20 Semantically Corrected Amharic Automatic Speech Recognition Samuael Adnew et.al. 2404.13362 translate read link
2024-04-20 Music Consistency Models Zhengcong Fei et.al. 2404.13358 translate read null
2024-04-20 Track Role Prediction of Single-Instrumental Sequences Changheon Han et.al. 2404.13286 translate read null
2024-04-19 Learn2Talk: 3D Talking Face Learns from 2D Talking Face Yixiang Zhuang et.al. 2404.12888 translate read null
2024-04-19 Efficient infusion of self-supervised representations in Automatic Speech Recognition Darshan Prabhu et.al. 2404.12628 translate read null
2024-04-18 TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches Rong Wang et.al. 2404.12077 translate read null
2024-04-18 Large Language Models: From Notes to Musical Form Lilac Atassi et.al. 2404.11976 translate read null
2024-04-17 Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation Ye Bai et.al. 2404.11275 translate read null
2024-04-16 Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training Pavel Denisov et.al. 2404.10922 translate read link
2024-04-16 Long-form music generation with latent diffusion Zach Evans et.al. 2404.10301 translate read null
2024-04-16 Anatomy of Industrial Scale Multilingual ASR Francis McCann Ramirez et.al. 2404.09841 translate read null
2024-04-15 Resilience of Large Language Models for Noisy Instructions Bin Wang et.al. 2404.09754 translate read null
2024-04-16 Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment Zhiqing Hong et.al. 2404.09313 translate read null
2024-04-12 Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task Hassan Ali et.al. 2404.08424 translate read null
2024-04-12 ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana Monica Romero et.al. 2404.08368 translate read null
2024-04-10 An inclusive review on deep learning techniques and their scope in handwriting recognition Sukhdeep Singh et.al. 2404.08011 translate read null
2024-04-12 An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution Tien-Hong Lo et.al. 2404.07575 translate read null
2024-04-12 Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping Kevin Zhang et.al. 2404.07341 translate read null
2024-04-12 Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness Xincan Feng et.al. 2404.06714 translate read link
2024-04-10 MuPT: A Generative Symbolic Music Pretrained Transformer Xingwei Qu et.al. 2404.06393 translate read null
2024-04-10 The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge Yiwei Guo et.al. 2404.06079 translate read null
2024-04-06 A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music Roopa Mayya et.al. 2404.05765 translate read null
2024-04-08 VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain Khai Le-Duc et.al. 2404.05659 translate read link
2024-04-07 Gull: A Generative Multifunctional Audio Codec Yi Luo et.al. 2404.04947 translate read null
2024-04-07 Safeguarding Voice Privacy: Harnessing Near-Ultrasonic Interference To Protect Against Unauthorized Audio Recording Forrest McKee et.al. 2404.04769 translate read null
2024-04-06 HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks Yingting Li et.al. 2404.04645 translate read link
2024-04-05 The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos Igor Cardoso et.al. 2404.04420 translate read null
2024-04-04 Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition Hainan Xu et.al. 2404.04295 translate read null
2024-04-05 Open vocabulary keyword spotting through transfer learning from speech synthesis Kesavaraj V et.al. 2404.03914 translate read null
2024-04-06 RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Detai Xin et.al. 2404.03204 translate read null
2024-04-03 Mai Ho’omāuna i ka ‘Ai: Language Models Improve Automatic Speech Recognition in Hawaiian Kaavya Chaparala et.al. 2404.03073 translate read null
2024-04-03 PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders Yu Pan et.al. 2404.02702 translate read null
2024-04-03 Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation Yejin Jeon et.al. 2404.02592 translate read null
2024-04-03 CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models Zaid Sheikh et.al. 2404.02408 translate read link
2024-04-02 BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition Alexandros Haliassos et.al. 2404.02098 translate read link
2024-04-02 Noise Masking Attacks and Defenses for Pretrained Speech Models Matthew Jagielski et.al. 2404.02052 translate read null
2024-04-02 Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal Elodie Gauthier et.al. 2404.01991 translate read link
2024-04-05 Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials Ali Akram et.al. 2404.01981 translate read null
2024-04-02 Transfer Learning from Whisper for Microscopic Intelligibility Prediction Paul Best et.al. 2404.01737 translate read null
2024-04-01 KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Adal Abilbekov et.al. 2404.01033 translate read null
2024-04-01 Voice Conversion Augmentation for Speaker Recognition on Defective Datasets Ruijie Tao et.al. 2404.00863 translate read null
2024-04-01 Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling Injune Hwang et.al. 2404.00856 translate read null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)