Audio Processing - 2024-03

Publish Date Title Authors PDF Translate Read Code
2024-03-31 Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation Rohan Chaudhury et.al. 2404.01339 translate read link
2024-03-31 CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Xiang Li et.al. 2404.00569 translate read link
2024-03-29 ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models Thibaut Thonet et.al. 2403.20262 translate read null
2024-03-29 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization Yafeng Chen et.al. 2403.19971 translate read link
2024-03-28 Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Yash Jain et.al. 2403.19822 translate read null
2024-03-28 Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2 Pierre-Michel Bousquet et.al. 2403.19634 translate read null
2024-03-28 Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition Siyuan Shen et.al. 2403.19224 translate read link
2024-03-28 LV-CTC: Non-autoregressive ASR with CTC and latent variable models Yuya Fujita et.al. 2403.19207 translate read null
2024-03-27 PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations Ehsan Latif et.al. 2403.18721 translate read null
2024-03-27 ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus Injy Hamed et.al. 2403.18182 translate read null
2024-03-28 DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition Yi-Cheng Wang et.al. 2403.17645 translate read null
2024-03-26 Extracting Biomedical Entities from Noisy Audio Transcripts Nima Ebadi et.al. 2403.17363 translate read null
2024-03-25 Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT Rohit Raju et.al. 2403.16655 translate read null
2024-03-25 Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator Takuhiro Kaneko et.al. 2403.16464 translate read null
2024-03-22 Privacy-Preserving End-to-End Spoken Language Understanding Yinggui Wang et.al. 2403.15510 translate read null
2024-03-26 A Multimodal Approach to Device-Directed Speech Detection with Large Language Models Dominik Wagner et.al. 2403.14438 translate read null
2024-03-21 XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception HyoJung Han et.al. 2403.14402 translate read null
2024-03-21 M $^3$ AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset Zhe Chen et.al. 2403.14168 translate read null
2024-03-21 The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data Alice Baird et.al. 2403.14048 translate read null
2024-03-20 Open Access NAO (OAN): a ROS2-based software framework for HRI applications with the NAO robot Antonio Bono et.al. 2403.13960 translate read null
2024-03-20 BanglaNum – A Public Dataset for Bengali Digit Recognition from Speech Mir Sayeed Mohammad et.al. 2403.13465 translate read null
2024-03-20 Advanced Long-Content Speech Recognition With Factorized Neural Transducer Xun Gong et.al. 2403.13423 translate read null
2024-03-20 KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario Huali Zhou et.al. 2403.13356 translate read null
2024-03-20 Building speech corpus with diverse voice characteristics for its prompt-based representation Aya Watanabe et.al. 2403.13353 translate read null
2024-03-20 Polaris: A Safety-focused LLM Constellation Architecture for Healthcare Subhabrata Mukherjee et.al. 2403.13313 translate read null
2024-03-19 FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer Dongyeong Hwang et.al. 2403.12821 translate read link
2024-03-19 Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation Yuto Ishikawa et.al. 2403.12477 translate read null
2024-03-19 An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis Yifan Peng et.al. 2403.12402 translate read null
2024-03-18 Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models Linus Nwankwo et.al. 2403.12273 translate read null
2024-03-18 Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models Emilian Postolache et.al. 2403.11706 translate read link
2024-03-18 QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation Zhizhen Zhou et.al. 2403.11626 translate read null
2024-03-18 AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition SooHwan Eom et.al. 2403.11578 translate read null
2024-03-16 Energy-Based Models with Applications to Speech and Language Processing Zhijian Ou et.al. 2403.10961 translate read null
2024-03-16 Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR Savitha Murthy et.al. 2403.10937 translate read null
2024-03-15 MusicHiFi: Fast High-Fidelity Stereo Vocoding Ge Zhu et.al. 2403.10493 translate read null
2024-03-15 Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks Peter Leer et.al. 2403.10420 translate read null
2024-03-14 SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages René Groh et.al. 2403.09753 translate read link
2024-03-14 More than words: Advancements and challenges in speech recognition for singing Anna Kruspe et.al. 2403.09298 translate read null
2024-03-13 Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition Wenjing Zhu et.al. 2403.08258 translate read null
2024-03-13 SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Jiayu Du et.al. 2403.08196 translate read link
2024-03-13 Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children Taekyung Ahn et.al. 2403.08187 translate read null
2024-03-13 EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech Ziqi Liang et.al. 2403.08164 translate read null
2024-03-12 Gujarati-English Code-Switching Speech Recognition using ensemble prediction of spoken language Yash Sharma et.al. 2403.08011 translate read null
2024-03-12 Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic Music Generation Keshav Bhandari et.al. 2403.07995 translate read null
2024-03-11 The evaluation of a code-switched Sepedi-English automatic speech recognition system Amanda Phaladi et.al. 2403.07947 translate read null
2024-03-12 Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets Jan Pešán et.al. 2403.07767 translate read null
2024-03-11 Real-Time Multimodal Cognitive Assistant for Emergency Medical Services Keshara Weerasinghe et.al. 2403.06734 translate read null
2024-03-11 Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR Yufeng Yang et.al. 2403.06387 translate read null
2024-03-10 SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations Amit Meghanani et.al. 2403.06260 translate read null
2024-03-09 HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling Chunhui Wang et.al. 2403.05989 translate read null
2024-03-09 Aligning Speech to Languages to Enhance Code-switching Speech Recognition Hexin Liu et.al. 2403.05887 translate read null
2024-03-07 Classist Tools: Social Class Correlates with Performance in NLP Amanda Cercas Curry et.al. 2403.04445 translate read null
2024-03-07 A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain Qusai Abo Obaidah et.al. 2403.04280 translate read null
2024-03-07 A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition Yusheng Dai et.al. 2403.04245 translate read link
2024-03-06 RADIA – Radio Advertisement Detection with Intelligent Analytics Jorge Álvarez et.al. 2403.03538 translate read null
2024-03-06 Non-verbal information in spontaneous speech – towards a new framework of analysis Tirza Biron et.al. 2403.03522 translate read null
2024-03-05 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Zeqian Ju et.al. 2403.03100 translate read null
2024-03-05 AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models Kazuki Kawamura et.al. 2403.02938 translate read null
2024-03-05 Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction Yue Li et.al. 2403.02918 translate read null
2024-03-04 PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Joonas Kalda et.al. 2403.02288 translate read null
2024-03-04 What has LeBenchmark Learnt about French Syntax? Zdravko Dugonjić et.al. 2403.02173 translate read null
2024-03-04 SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR Zhiyun Fan et.al. 2403.02010 translate read null
2024-03-04 Language and Speech Technology for Central Kurdish Varieties Sina Ahmadi et.al. 2403.01983 translate read link
2024-03-03 PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion Tianhua Qi et.al. 2403.01494 translate read null
2024-03-03 A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement Ravi Shankar et.al. 2403.01369 translate read null
2024-03-03 a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification Hye-jin Shim et.al. 2403.01355 translate read link
2024-03-02 Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey Hamza Kheddar et.al. 2403.01255 translate read null
2024-03-02 Towards Accurate Lip-to-Speech Synthesis in-the-Wild Sindhu Hegde et.al. 2403.01087 translate read null
2024-03-01 VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis Weiwei Lin et.al. 2403.00529 translate read null
2024-03-01 Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview Heyang Liu et.al. 2403.00370 translate read null
2024-03-01 Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification Mufan Sang et.al. 2403.00293 translate read null
2024-03-01 Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART Aniket Tathe et.al. 2403.00212 translate read null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)