Audio Processing - 2024-06 | Paper Arxiv Daily

Audio Processing - 2024-06

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-06-30	An Attribute Interpolation Method in Speech Synthesis by Model Merging	Masato Murata et.al.	2407.00766	translate	read	null
2024-06-30	Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations	Salah Zaiem et.al.	2407.00756	translate	read	null
2024-06-30	FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis	Yinlin Guo et.al.	2407.00753	translate	read	null
2024-06-29	When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration	Philipp Allgeuer et.al.	2407.00518	translate	read	null
2024-06-28	SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR	Qiuming Zhao et.al.	2406.19706	translate	read	null
2024-06-28	Less is More: Accurate Speech Recognition & Translation without Web-Scale Data	Krishna C. Puvvada et.al.	2406.19674	translate	read	null
2024-06-27	Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects	Orevaoghene Ahia et.al.	2406.19564	translate	read	null
2024-06-27	Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment	Rotem Rousso et.al.	2406.19363	translate	read	null
2024-06-27	Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems	Zheng Fang et.al.	2406.19311	translate	read	null
2024-06-27	Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models	Borodin Kirill Nikolayevich et.al.	2406.19243	translate	read	null
2024-06-27	DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability	Hyun Joon Park et.al.	2406.19135	translate	read	link
2024-06-27	Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over	Atsunori Ogawa et.al.	2406.18972	translate	read	null
2024-06-27	Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network	Yehoshua Dissen et.al.	2406.18928	translate	read	null
2024-06-27	Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study	Peikun Chen et.al.	2406.18862	translate	read	null
2024-06-26	A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems	Karn N. Watcharasupat et.al.	2406.18747	translate	read	link
2024-06-26	Dynamic Data Pruning for Automatic Speech Recognition	Qiao Xiao et.al.	2406.18373	translate	read	null
2024-06-26	MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research	Song Li et.al.	2406.18301	translate	read	null
2024-06-26	Automatic Speech Recognition for Hindi	Anish Saha et.al.	2406.18135	translate	read	null
2024-06-26	ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs	Ahmed Heakl et.al.	2406.18120	translate	read	link
2024-06-26	SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR	Shuaishuai Ye et.al.	2406.18021	translate	read	null
2024-06-25	Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment	Paarth Neekhara et.al.	2406.17957	translate	read	null
2024-06-25	Sequential Editing for Lifelong Training of Speech Recognition Models	Devang Kulshreshtha et.al.	2406.17935	translate	read	null
2024-06-25	FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data	Dancheng Liu et.al.	2406.17926	translate	read	link
2024-06-25	Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals	Kentaro Seki et.al.	2406.17722	translate	read	null
2024-06-25	Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model	Jiawen Huang et.al.	2406.17618	translate	read	link
2024-06-25	MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization	Adriana Fernandez-Lopez et.al.	2406.17614	translate	read	null
2024-06-25	High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model	Joun Yeop Lee et.al.	2406.17310	translate	read	null
2024-06-25	A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR	Van Tung Pham et.al.	2406.17272	translate	read	null
2024-06-25	Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation	Yingting Li et.al.	2406.17257	translate	read	null
2024-06-24	Investigating Confidence Estimation Measures for Speaker Diarization	Anurag Chowdhury et.al.	2406.17124	translate	read	null
2024-06-24	Exploring the Capability of Mamba in Speech Applications	Koichi Miyazaki et.al.	2406.16808	translate	read	null
2024-06-24	Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024	Sai Koneru et.al.	2406.16777	translate	read	null
2024-06-25	Towards Zero-Shot Text-To-Speech for Arabic Dialects	Khai Duy Doan et.al.	2406.16751	translate	read	null
2024-06-24	One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection	Hyun Myung Kim et.al.	2406.16716	translate	read	null
2024-06-24	RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging	Mingyang Zhang et.al.	2406.16326	translate	read	null
2024-06-24	DreamVoice: Text-Guided Voice Conversion	Jiarui Hai et.al.	2406.16314	translate	read	null
2024-06-23	Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss	Muhammad Shakeel et.al.	2406.16120	translate	read	null
2024-06-23	Decoder-only Architecture for Streaming End-to-end Speech Recognition	Emiru Tsunoo et.al.	2406.16107	translate	read	null
2024-06-22	Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment	Heejin Do et.al.	2406.15723	translate	read	null
2024-06-21	PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics	Amir Nassereldine et.al.	2406.15668	translate	read	null
2024-06-21	Perception of Phonological Assimilation by Neural Speech Recognition Models	Charlotte Pouw et.al.	2406.15265	translate	read	null
2024-06-21	InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions	Yu Nakagome et.al.	2406.14890	translate	read	null
2024-06-20	An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks	Varsha Suresh et.al.	2406.14747	translate	read	null
2024-06-21	DASB – Discrete Audio and Speech Benchmark	Pooneh Mousavi et.al.	2406.14294	translate	read	null
2024-06-20	Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries	Anna Wróblewska et.al.	2406.14266	translate	read	null
2024-06-19	Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control	Alexander Blatt et.al.	2406.13842	translate	read	null
2024-06-19	ManWav: The First Manchu ASR Model	Jean Seo et.al.	2406.13502	translate	read	null
2024-06-19	Children’s Speech Recognition through Discrete Token Enhancement	Vrunda N. Sukhadia et.al.	2406.13431	translate	read	null
2024-06-19	CEC: A Noisy Label Detection Method for Speaker Recognition	Yao Shen et.al.	2406.13268	translate	read	null
2024-06-18	Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech	Cheol Jun Cho et.al.	2406.12998	translate	read	null
2024-06-18	Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition	Kuan-Chen Wang et.al.	2406.12699	translate	read	null
2024-06-18	Transcribe, Align and Segment: Creating speech datasets for low-resource languages	Taras Sereda et.al.	2406.12674	translate	read	null
2024-06-18	Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech	Adrien Pupier et.al.	2406.12621	translate	read	null
2024-06-18	Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting	Yosuke Kashiwagi et.al.	2406.12611	translate	read	null
2024-06-18	Unsupervised Online Continual Learning for Automatic Speech Recognition	Steven Vander Eeckt et.al.	2406.12503	translate	read	null
2024-06-18	Performant ASR Models for Medical Entities in Accented Speech	Tejumade Afonja et.al.	2406.12387	translate	read	null
2024-06-18	Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model	Hayato Futami et.al.	2406.12317	translate	read	null
2024-06-18	JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning	Boyu Chen et.al.	2406.12292	translate	read	null
2024-06-18	SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	Young Jin Ahn et.al.	2406.12233	translate	read	null
2024-06-18	A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis	Guoqiang Hu et.al.	2406.12164	translate	read	null
2024-06-17	1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis	Sewade Ogun et.al.	2406.11727	translate	read	null
2024-06-17	GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement	Yifan Yang et.al.	2406.11546	translate	read	link
2024-06-17	Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9	Do Hyun Lee et.al.	2406.11248	translate	read	null
2024-06-17	Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision	Yafeng Chen et.al.	2406.11169	translate	read	null
2024-06-16	Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech	Guan-Ting Lin et.al.	2406.11064	translate	read	null
2024-06-16	NAST: Noise Aware Speech Tokenization for Speech Language Models	Shoval Messica et.al.	2406.11037	translate	read	link
2024-06-16	Large Language Models for Dysfluency Detection in Stuttered Speech	Dominik Wagner et.al.	2406.11025	translate	read	null
2024-06-16	Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models	Dominik Wagner et.al.	2406.11022	translate	read	null
2024-06-16	Optimized Speculative Sampling for GPU Hardware Accelerators	Dominik Wagner et.al.	2406.11016	translate	read	null
2024-06-16	CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving	Bhavani Shankar et.al.	2406.10993	translate	read	null
2024-06-14	Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation	Dena Mujtaba et.al.	2406.10177	translate	read	null
2024-06-14	On the Evaluation of Speech Foundation Models for Spoken Language Understanding	Siddhant Arora et.al.	2406.10083	translate	read	null
2024-06-14	Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation	Andrew Rouditchenko et.al.	2406.10082	translate	read	link
2024-06-14	Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection	Haoyu Wang et.al.	2406.10052	translate	read	link
2024-06-14	ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR	Vishwanath Pratap Singh et.al.	2406.09999	translate	read	null
2024-06-14	An efficient text augmentation approach for contextualized Mandarin speech recognition	Naijun Zheng et.al.	2406.09950	translate	read	null
2024-06-14	Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition	Yicong Jiang et.al.	2406.09873	translate	read	null
2024-06-14	MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model	Jiatong Shi et.al.	2406.09869	translate	read	null
2024-06-14	Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy	Linhan Ma et.al.	2406.09844	translate	read	null
2024-06-14	Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation	Kaien Mo et.al.	2406.09821	translate	read	null
2024-06-13	Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech	Martina Valente et.al.	2406.09290	translate	read	null
2024-06-13	Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t	Chihiro Taguchi et.al.	2406.09202	translate	read	null
2024-06-13	LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks	Amit Meghanani et.al.	2406.09153	translate	read	null
2024-06-13	ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis	Dehua Tao et.al.	2406.08989	translate	read	null
2024-06-13	Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition	William Ravenscroft et.al.	2406.08914	translate	read	null
2024-06-13	AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers	Emil Biju et.al.	2406.08904	translate	read	null
2024-06-13	A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed	Ziyang Zhuang et.al.	2406.08835	translate	read	null
2024-06-13	Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems	Zhengyang Chen et.al.	2406.08812	translate	read	null
2024-06-12	ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets	Jiatong Shi et.al.	2406.08641	translate	read	null
2024-06-12	Emotion Manipulation Through Music – A Deep Learning Interactive Visual Approach	Adel N. Abdalla et.al.	2406.08623	translate	read	null
2024-06-12	SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models	Chun Yin et.al.	2406.08445	translate	read	null
2024-06-12	TokSing: Singing Voice Synthesis based on Discrete Tokens	Yuning Wu et.al.	2406.08416	translate	read	null
2024-06-12	Neural Blind Source Separation and Diarization for Distant Speech Recognition	Yoshiaki Bando et.al.	2406.08396	translate	read	null
2024-06-12	Towards Unsupervised Speech Recognition Without Pronunciation Models	Junrui Ni et.al.	2406.08380	translate	read	null
2024-06-12	Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques	Yuanchao Li et.al.	2406.08353	translate	read	link
2024-06-12	Refining Self-Supervised Learnt Speech Representation using Brain Activations	Hengyu Li et.al.	2406.08266	translate	read	null
2024-06-12	Transformer-based Model for ASR N-Best Rescoring and Rewriting	Iwen E. Kang et.al.	2406.08207	translate	read	null
2024-06-12	FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter	Yuanjun Lv et.al.	2406.08196	translate	read	link
2024-06-12	Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data	Yuma Shirahata et.al.	2406.08111	translate	read	null
2024-06-12	Can Large Language Models Understand Spatial Audio?	Changli Tang et.al.	2406.07914	translate	read	null
2024-06-11	Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?	Qingkai Fang et.al.	2406.07289	translate	read	null
2024-06-11	Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment	Takuto Igarashi et.al.	2406.07280	translate	read	null
2024-06-11	AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection	Rong Gong et.al.	2406.07256	translate	read	null
2024-06-11	SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark	Yuki Saito et.al.	2406.07254	translate	read	null
2024-06-11	CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems	Haibin Wu et.al.	2406.07237	translate	read	null
2024-06-11	MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms	Seung-bin Kim et.al.	2406.07103	translate	read	link
2024-06-11	Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter	Andrei Andrusenko et.al.	2406.07096	translate	read	null
2024-06-11	Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech	Mateusz Czyżnikiewicz et.al.	2406.07090	translate	read	null
2024-06-11	Reading Miscue Detection in Primary School through Automatic Speech Recognition	Lingyun Gao et.al.	2406.07060	translate	read	null
2024-06-10	Synthetic Query Generation using Large Language Models for Virtual Assistants	Sonal Sannigrahi et.al.	2406.06729	translate	read	null
2024-06-10	Meta Learning Text-to-Speech Synthesis in over 7000 Languages	Florian Lux et.al.	2406.06403	translate	read	link
2024-06-10	A Parameter-efficient Language Extension Framework for Multilingual ASR	Wei Liu et.al.	2406.06329	translate	read	null
2024-06-10	Quantifying the effect of speech pathology on automatic and human speaker verification	Bence Mark Halpern et.al.	2406.06208	translate	read	null
2024-06-10	JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis	Hyunjae Cho et.al.	2406.06111	translate	read	null
2024-06-10	Prompting Large Language Models with Audio for General-Purpose Speech Summarization	Wonjune Kang et.al.	2406.05968	translate	read	link
2024-06-09	Conserving Human Creativity with Evolutionary Generative Algorithms: A Case Study in Music Generation	Justin Kilb et.al.	2406.05873	translate	read	null
2024-06-09	Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels	Shlomo Salo Elia et.al.	2406.05863	translate	read	null
2024-06-09	Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper	Chih-Kai Yang et.al.	2406.05806	translate	read	null
2024-06-09	Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper’s Encoder for Efficient Parameter Reduction in Automated Assessment	Huma Ameer et.al.	2406.05784	translate	read	null
2024-06-09	SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion	Bingsong Bai et.al.	2406.05692	translate	read	null
2024-06-07	The Database and Benchmark for Source Speaker Verification Against Voice Conversion	Ze Li et.al.	2406.04951	translate	read	null
2024-06-07	LLM-based speaker diarization correction: A generalizable approach	Georgios Efstathiadis et.al.	2406.04927	translate	read	link
2024-06-07	Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR	Shaojun Li et.al.	2406.04791	translate	read	null
2024-06-07	Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis	Xintong Wang et.al.	2406.04595	translate	read	null
2024-06-07	Neural Codec-based Adversarial Sample Detection for Speaker Verification	Xuanjun Chen et.al.	2406.04582	translate	read	null
2024-06-06	Flexible Multichannel Speech Enhancement for Noise-Robust Frontend	Ante Jukić et.al.	2406.04552	translate	read	null
2024-06-06	Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation	Keqi Deng et.al.	2406.04541	translate	read	null
2024-06-06	To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation	Abdul Waheed et.al.	2406.04512	translate	read	link
2024-06-06	Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline	Ali N. Salman et.al.	2406.04494	translate	read	null
2024-06-06	Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis	Théodor Lemerle et.al.	2406.04467	translate	read	link
2024-06-06	VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling	Zeyue Tian et.al.	2406.04321	translate	read	link
2024-06-06	Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement	Wangyou Zhang et.al.	2406.04269	translate	read	null
2024-06-06	Hypernetworks for Personalizing ASR to Atypical Speech	Max Mueller-Eberstein et.al.	2406.04240	translate	read	null
2024-06-06	Helsinki Speech Challenge 2024	Martin Ludvigsen et.al.	2406.04123	translate	read	null
2024-06-06	BLSP-Emo: Towards Empathetic Large Speech-Language Models	Chen Wang et.al.	2406.03872	translate	read	link
2024-06-06	Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores	Jiaming Zhou et.al.	2406.03814	translate	read	null
2024-06-06	Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU	Daniel Galvez et.al.	2406.03791	translate	read	null
2024-06-06	Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining	Jinlong Xue et.al.	2406.03714	translate	read	null
2024-06-06	Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model	Jinlong Xue et.al.	2406.03706	translate	read	null
2024-06-05	Style Mixture of Experts for Expressive Text-To-Speech Synthesis	Ahad Jawaid et.al.	2406.03637	translate	read	null
2024-06-05	Enhancing CTC-based speech recognition with diverse modeling units	Shiyi Han et.al.	2406.03274	translate	read	null
2024-06-05	Error-preserving Automatic Speech Recognition of Young English Learners’ Language	Janick Michot et.al.	2406.03235	translate	read	link
2024-06-05	StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning	Shaolei Zhang et.al.	2406.03049	translate	read	link
2024-06-05	4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders	Yui Sudo et.al.	2406.02950	translate	read	null
2024-06-05	SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation	Hsuan Su et.al.	2406.02925	translate	read	null
2024-06-05	Text Injection for Neural Contextual Biasing	Zhong Meng et.al.	2406.02921	translate	read	null
2024-06-04	Keyword-Guided Adaptation of Automatic Speech Recognition	Aviv Shamsian et.al.	2406.02649	translate	read	null
2024-06-04	Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion	Ruiqi Li et.al.	2406.02429	translate	read	null
2024-06-04	An Independence-promoting Loss for Music Generation with Language Models	Jean-Marie Lemercier et.al.	2406.02315	translate	read	null
2024-06-04	Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models	Victor Miara et.al.	2406.02285	translate	read	link
2024-06-04	ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency	Yafeng Chen et.al.	2406.02167	translate	read	null
2024-06-04	Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision	Saierdaer Yusuyin et.al.	2406.02166	translate	read	link
2024-06-04	Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis	Kun Zhou et.al.	2406.02009	translate	read	null
2024-06-04	Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping	Lun Wang et.al.	2406.02004	translate	read	null
2024-06-03	TinySV: Speaker Verification in TinyML with On-device Learning	Massimo Pavan et.al.	2406.01655	translate	read	null
2024-06-03	Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach	Ara Yeroyan et.al.	2406.01446	translate	read	null
2024-06-03	Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization	Firas Khader et.al.	2406.01314	translate	read	null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)