Audio Processing - 2024-04 | Paper Arxiv Daily

Audio Processing - 2024-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-04-30	Who is Authentic Speaker	Qiang Huang et.al.	2405.00248	translate	read	null
2024-04-30	ConFides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration	Sunwoo Ha et.al.	2405.00223	translate	read	null
2024-04-30	Expressivity and Speech Synthesis	Andreas Triantafyllopoulos et.al.	2404.19363	translate	read	null
2024-04-30	Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation	Eyal Liron Dolev et.al.	2404.19310	translate	read	null
2024-04-30	EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization	Jianzong Wang et.al.	2404.19214	translate	read	null
2024-04-30	EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning	Ziqi Liang et.al.	2404.19212	translate	read	null
2024-04-29	Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification	Artem Abzaliev et.al.	2404.18739	translate	read	null
2024-04-29	MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis	Xiang Li et.al.	2404.18398	translate	read	link
2024-04-30	ComposerX: Multi-Agent Symbolic Music Composition with LLMs	Qixin Deng et.al.	2404.18081	translate	read	link
2024-04-27	A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness	Oubaida Chouchane et.al.	2404.17810	translate	read	null
2024-04-26	An RFP dataset for Real, Fake, and Partially fake audio detection	Abdulazeez AlAli et.al.	2404.17721	translate	read	null
2024-04-26	A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification	Rémi Uro et.al.	2404.17552	translate	read	null
2024-04-26	Child Speech Recognition in Human-Robot Interaction: Problem Solved?	Ruben Janssens et.al.	2404.17394	translate	read	null
2024-04-26	Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks	Mingrui He et.al.	2404.17280	translate	read	null
2024-04-29	COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations	Ruben Ciranni et.al.	2404.16969	translate	read	null
2024-04-26	Automatic Speech Recognition System-Independent Word Error Rate Estimation	Chanho Park et.al.	2404.16743	translate	read	null
2024-04-25	Developing Acoustic Models for Automatic Speech Recognition in Swedish	Giampiero Salvi et.al.	2404.16547	translate	read	null
2024-04-25	U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF	Xingchen Song et.al.	2404.16407	translate	read	null
2024-04-24	Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges	Badri Narayana Patro et.al.	2404.16112	translate	read	link
2024-04-24	Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning	Zuheng Kang et.al.	2404.15704	translate	read	null
2024-04-24	HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts	Xinlei Niu et.al.	2404.15637	translate	read	null
2024-04-23	Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information	Chihiro Taguchi et.al.	2404.15501	translate	read	link
2024-04-23	Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations	Theo Lepage et.al.	2404.14913	translate	read	null
2024-04-23	Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance	Tsubasa Ochiai et.al.	2404.14860	translate	read	null
2024-04-25	FlashSpeech: Efficient Zero-Shot Speech Synthesis	Zhen Ye et.al.	2404.14700	translate	read	null
2024-04-22	Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal Assistants	Nina Tran et.al.	2404.14605	translate	read	null
2024-04-22	Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks	Alexandre Bittar et.al.	2404.14024	translate	read	null
2024-04-23	Retrieval-Augmented Audio Deepfake Detection	Zuheng Kang et.al.	2404.13892	translate	read	null
2024-04-23	Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications	Charith Chandra Sai Balne et.al.	2404.13506	translate	read	null
2024-04-20	Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan	Zeinali Hossein et.al.	2404.13428	translate	read	null
2024-04-20	Semantically Corrected Amharic Automatic Speech Recognition	Samuael Adnew et.al.	2404.13362	translate	read	link
2024-04-20	Music Consistency Models	Zhengcong Fei et.al.	2404.13358	translate	read	null
2024-04-20	Track Role Prediction of Single-Instrumental Sequences	Changheon Han et.al.	2404.13286	translate	read	null
2024-04-19	Learn2Talk: 3D Talking Face Learns from 2D Talking Face	Yixiang Zhuang et.al.	2404.12888	translate	read	null
2024-04-19	Efficient infusion of self-supervised representations in Automatic Speech Recognition	Darshan Prabhu et.al.	2404.12628	translate	read	null
2024-04-18	TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches	Rong Wang et.al.	2404.12077	translate	read	null
2024-04-18	Large Language Models: From Notes to Musical Form	Lilac Atassi et.al.	2404.11976	translate	read	null
2024-04-17	Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation	Ye Bai et.al.	2404.11275	translate	read	null
2024-04-16	Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training	Pavel Denisov et.al.	2404.10922	translate	read	link
2024-04-16	Long-form music generation with latent diffusion	Zach Evans et.al.	2404.10301	translate	read	null
2024-04-16	Anatomy of Industrial Scale Multilingual ASR	Francis McCann Ramirez et.al.	2404.09841	translate	read	null
2024-04-15	Resilience of Large Language Models for Noisy Instructions	Bin Wang et.al.	2404.09754	translate	read	null
2024-04-16	Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment	Zhiqing Hong et.al.	2404.09313	translate	read	null
2024-04-12	Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task	Hassan Ali et.al.	2404.08424	translate	read	null
2024-04-12	ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana	Monica Romero et.al.	2404.08368	translate	read	null
2024-04-10	An inclusive review on deep learning techniques and their scope in handwriting recognition	Sukhdeep Singh et.al.	2404.08011	translate	read	null
2024-04-12	An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution	Tien-Hong Lo et.al.	2404.07575	translate	read	null
2024-04-12	Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping	Kevin Zhang et.al.	2404.07341	translate	read	null
2024-04-12	Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness	Xincan Feng et.al.	2404.06714	translate	read	link
2024-04-10	MuPT: A Generative Symbolic Music Pretrained Transformer	Xingwei Qu et.al.	2404.06393	translate	read	null
2024-04-10	The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge	Yiwei Guo et.al.	2404.06079	translate	read	null
2024-04-06	A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music	Roopa Mayya et.al.	2404.05765	translate	read	null
2024-04-08	VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain	Khai Le-Duc et.al.	2404.05659	translate	read	link
2024-04-07	Gull: A Generative Multifunctional Audio Codec	Yi Luo et.al.	2404.04947	translate	read	null
2024-04-07	Safeguarding Voice Privacy: Harnessing Near-Ultrasonic Interference To Protect Against Unauthorized Audio Recording	Forrest McKee et.al.	2404.04769	translate	read	null
2024-04-06	HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks	Yingting Li et.al.	2404.04645	translate	read	link
2024-04-05	The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos	Igor Cardoso et.al.	2404.04420	translate	read	null
2024-04-04	Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition	Hainan Xu et.al.	2404.04295	translate	read	null
2024-04-05	Open vocabulary keyword spotting through transfer learning from speech synthesis	Kesavaraj V et.al.	2404.03914	translate	read	null
2024-04-06	RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis	Detai Xin et.al.	2404.03204	translate	read	null
2024-04-03	Mai Ho’omāuna i ka ‘Ai: Language Models Improve Automatic Speech Recognition in Hawaiian	Kaavya Chaparala et.al.	2404.03073	translate	read	null
2024-04-03	PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders	Yu Pan et.al.	2404.02702	translate	read	null
2024-04-03	Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation	Yejin Jeon et.al.	2404.02592	translate	read	null
2024-04-03	CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models	Zaid Sheikh et.al.	2404.02408	translate	read	link
2024-04-02	BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition	Alexandros Haliassos et.al.	2404.02098	translate	read	link
2024-04-02	Noise Masking Attacks and Defenses for Pretrained Speech Models	Matthew Jagielski et.al.	2404.02052	translate	read	null
2024-04-02	Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal	Elodie Gauthier et.al.	2404.01991	translate	read	link
2024-04-05	Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials	Ali Akram et.al.	2404.01981	translate	read	null
2024-04-02	Transfer Learning from Whisper for Microscopic Intelligibility Prediction	Paul Best et.al.	2404.01737	translate	read	null
2024-04-01	KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis	Adal Abilbekov et.al.	2404.01033	translate	read	null
2024-04-01	Voice Conversion Augmentation for Speaker Recognition on Defective Datasets	Ruijie Tao et.al.	2404.00863	translate	read	null
2024-04-01	Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling	Injune Hwang et.al.	2404.00856	translate	read	null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)