Audio Processing - 2024-11

Publish Date Title Authors PDF Translate Read Code
2024-11-30 From Audio Deepfake Detection to AI-Generated Music Detection – A Pathway and Overview Yupei Li et.al. 2412.00571 translate read null
2024-11-30 Sample adaptive data augmentation with progressive scheduling Hongxuan Lu et.al. 2412.00415 translate read null
2024-11-30 Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models Nadeen Fathallah et.al. 2412.00342 translate read null
2024-11-30 MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI Jongmin Jung et.al. 2412.00325 translate read null
2024-11-30 Improving speaker verification robustness with synthetic emotional utterances Nikhil Kumar Koditala et.al. 2412.00319 translate read null
2024-11-29 Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities Haorui He et.al. 2411.19770 translate read null
2024-11-29 Memristive Nanowire Network for Energy Efficient Audio Classification: Pre-Processing-Free Reservoir Computing with Reduced Latency Akshaya Rajesh et.al. 2411.19611 translate read null
2024-11-28 ArEEG_Words: Dataset for Envisioned Speech Recognition using EEG for Arabic Words Hazem Darwish et.al. 2411.18888 translate read null
2024-11-27 EEG-Based Analysis of Brain Responses in Multi-Modal Human-Robot Interaction: Modulating Engagement Suzanne Oliver et.al. 2411.18587 translate read null
2024-11-27 AMPS: ASR with Multimodal Paraphrase Supervision Amruta Parulekar et.al. 2411.18368 translate read null
2024-11-27 Continual Learning in Machine Speech Chain Using Gradient Episodic Memory Geoffrey Tyndall et.al. 2411.18320 translate read null
2024-11-27 Aligning Pre-trained Models for Spoken Language Translation Šimon Sedláček et.al. 2411.18294 translate read null
2024-11-27 Efficient Nonlinear Function Approximation in Analog Resistive Crossbars for Recurrent Neural Networks Junyi Yang et.al. 2411.18271 translate read null
2024-11-27 How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario Shih-Heng Wang et.al. 2411.18217 translate read null
2024-11-27 Machine Unlearning reveals that the Gender-based Violence Victim Condition can be detected from Speech in a Speaker-Agnostic Setting Emma Reyner-Fuentes et.al. 2411.18177 translate read null
2024-11-27 MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models Thai-Binh Nguyen et.al. 2411.18152 translate read null
2024-11-27 SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation Wenyi Yu et.al. 2411.18138 translate read null
2024-11-27 Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition Shih-heng Wang et.al. 2411.18107 translate read null
2024-11-26 Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis Akshita Gupta et.al. 2411.17690 translate read null
2024-11-26 Scaling Speech-Text Pre-training with Synthetic Interleaved Data Aohan Zeng et.al. 2411.17607 translate read null
2024-11-26 Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition Hyeonseung Lee et.al. 2411.17537 translate read null
2024-11-26 Comparative Analysis of ASR Methods for Speech Deepfake Detection Davide Salvi et.al. 2411.17349 translate read null
2024-11-26 k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning Yifan Yang et.al. 2411.17100 translate read null
2024-11-25 Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN Elona Shatri et.al. 2411.16405 translate read null
2024-11-25 The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024 Mohammadreza Molavi et.al. 2411.16276 translate read null
2024-11-25 SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations Youngjun Sim et.al. 2411.16147 translate read null
2024-11-24 A Training-Free Approach for Music Style Transfer with Latent Diffusion Models Sooyoung Kim et.al. 2411.15913 translate read null
2024-11-22 Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru Ordering Mostafa Varzaneh et.al. 2411.15372 translate read null
2024-11-22 Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network Irfan Nafiz Shahan et.al. 2411.15082 translate read link
2024-11-22 VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space Armani Rodriguez et.al. 2411.14642 translate read null
2024-11-21 Generative AI for Music and Audio Hao-Wen Dong et.al. 2411.14627 translate read null
2024-11-20 From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language Muhammad Sharif et.al. 2411.14493 translate read null
2024-11-21 Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge Ruiyang Qin et.al. 2411.13766 translate read null
2024-11-18 A Novel Speech Analysis and Correction Tool for Arabic-Speaking Children Lamia Berriche et.al. 2411.13592 translate read null
2024-11-20 CAFE A Novel Code switching Dataset for Algerian Dialect French and English Houssam Eddine-Othman Lachemat et.al. 2411.13424 translate read null
2024-11-20 I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception Jiawei Zhang et.al. 2411.13314 translate read null
2024-11-20 Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM Jiawei Yu et.al. 2411.13159 translate read null
2024-11-21 Improving Controllability and Editability for Pretrained Text-to-Music Generation Models Yixiao Zhang et.al. 2411.12641 translate read null
2024-11-19 Whisper Finetuning on Nepali Language Sanjay Rijal et.al. 2411.12587 translate read null
2024-11-18 An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems Jingyu Li et.al. 2411.11353 translate read null
2024-11-18 Study of the Performance of CEEMDAN in Underdetermined Speech Separation Rawad Melhem et.al. 2411.11312 translate read null
2024-11-18 SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features Yu-Fei Shi et.al. 2411.11232 translate read null
2024-11-17 Inter-linguistic Phonetic Composition (IPC): A Theoretical and Computational Approach to Enhance Second Language Pronunciation Jisang Park et.al. 2411.10927 translate read null
2024-11-16 BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization Md. Nazmus Sadat Samin et.al. 2411.10879 translate read link
2024-11-16 Bilingual Text-dependent Speaker Verification with Pre-trained Models for TdSV Challenge 2024 Seyed Ali Farokh et.al. 2411.10828 translate read null
2024-11-15 SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Joseph Liu et.al. 2411.10510 translate read link
2024-11-15 Interactive Cycle Model – The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses Libo Wang et.al. 2411.10362 translate read null
2024-11-15 Systolic Arrays and Structured Pruning Co-design for Efficient Transformers in Edge Systems Pedro Palacios et.al. 2411.10285 translate read null
2024-11-15 DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization Christos Koutlis et.al. 2411.10193 translate read link
2024-11-15 XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection Yang Xiao et.al. 2411.10027 translate read link
2024-11-15 Zero-shot Voice Conversion with Diffusion Transformers Songting Liu et.al. 2411.09943 translate read null
2024-11-14 Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data Rik Raes et.al. 2411.09431 translate read null
2024-11-14 Transferable Adversarial Attacks against ASR Xiaoxue Gao et.al. 2411.09220 translate read null
2024-11-14 Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation Kuiyuan Zhang et.al. 2411.09167 translate read null
2024-11-13 Language Models for Music Medicine Generation Emmanouil Nikolakakis et.al. 2411.09080 translate read null
2024-11-14 Evaluating Synthetic Command Attacks on Smart Voice Assistants Zhengxian He et.al. 2411.08316 translate read null
2024-11-13 PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation Yungang Yi et.al. 2411.08307 translate read null
2024-11-11 Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition Yoshiki Masuyama et.al. 2411.06968 translate read link
2024-11-11 DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions Shu-Tong Niu et.al. 2411.06667 translate read null
2024-11-10 Debatts: Zero-Shot Debating Text-to-Speech Synthesis Yiqiao Huang et.al. 2411.06540 translate read null
2024-11-10 CTC-Assisted LLM-Based Contextual ASR Guanrou Yang et.al. 2411.06437 translate read link
2024-11-07 Dialectal Coverage And Generalization in Arabic Speech Recognition Amirbek Djanibekov et.al. 2411.05872 translate read null
2024-11-07 Sentiment Analysis of Spanish Political Party Tweets Using Pre-trained Language Models Chuqiao Song et.al. 2411.04862 translate read null
2024-11-07 Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages Leena G Pillai et.al. 2411.04573 translate read null
2024-11-06 Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks Felipe Marra et.al. 2411.03948 translate read null
2024-11-04 Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Alexandros Haliassos et.al. 2411.02256 translate read link
2024-11-04 Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data Sofiane Azzouz et.al. 2411.02037 translate read null
2024-11-04 CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching Yu Pan et.al. 2411.02026 translate read null
2024-11-04 MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence Fuming You et.al. 2411.01805 translate read null
2024-11-03 SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci et.al. 2411.01710 translate read null
2024-11-02 Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection Han Yin et.al. 2411.01174 translate read link
2024-11-02 Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Shijia Liao et.al. 2411.01156 translate read link
2024-11-01 Enhancing AAC Software for Dysarthric Speakers in e-Health Settings: An Evaluation Using TORGO Macarious Hui et.al. 2411.00980 translate read null
2024-11-04 Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval Nikolaos Flemotomos et.al. 2411.00664 translate read null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)