Audio Processing - 2024-06

Publish Date Title Authors PDF Translate Read Code
2024-06-30 An Attribute Interpolation Method in Speech Synthesis by Model Merging Masato Murata et.al. 2407.00766 translate read null
2024-06-30 Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations Salah Zaiem et.al. 2407.00756 translate read null
2024-06-30 FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis Yinlin Guo et.al. 2407.00753 translate read null
2024-06-29 When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration Philipp Allgeuer et.al. 2407.00518 translate read null
2024-06-28 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR Qiuming Zhao et.al. 2406.19706 translate read null
2024-06-28 Less is More: Accurate Speech Recognition & Translation without Web-Scale Data Krishna C. Puvvada et.al. 2406.19674 translate read null
2024-06-27 Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects Orevaoghene Ahia et.al. 2406.19564 translate read null
2024-06-27 Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment Rotem Rousso et.al. 2406.19363 translate read null
2024-06-27 Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems Zheng Fang et.al. 2406.19311 translate read null
2024-06-27 Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models Borodin Kirill Nikolayevich et.al. 2406.19243 translate read null
2024-06-27 DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability Hyun Joon Park et.al. 2406.19135 translate read link
2024-06-27 Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over Atsunori Ogawa et.al. 2406.18972 translate read null
2024-06-27 Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network Yehoshua Dissen et.al. 2406.18928 translate read null
2024-06-27 Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study Peikun Chen et.al. 2406.18862 translate read null
2024-06-26 A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Karn N. Watcharasupat et.al. 2406.18747 translate read link
2024-06-26 Dynamic Data Pruning for Automatic Speech Recognition Qiao Xiao et.al. 2406.18373 translate read null
2024-06-26 MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research Song Li et.al. 2406.18301 translate read null
2024-06-26 Automatic Speech Recognition for Hindi Anish Saha et.al. 2406.18135 translate read null
2024-06-26 ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs Ahmed Heakl et.al. 2406.18120 translate read link
2024-06-26 SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Shuaishuai Ye et.al. 2406.18021 translate read null
2024-06-25 Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment Paarth Neekhara et.al. 2406.17957 translate read null
2024-06-25 Sequential Editing for Lifelong Training of Speech Recognition Models Devang Kulshreshtha et.al. 2406.17935 translate read null
2024-06-25 FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data Dancheng Liu et.al. 2406.17926 translate read link
2024-06-25 Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals Kentaro Seki et.al. 2406.17722 translate read null
2024-06-25 Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model Jiawen Huang et.al. 2406.17618 translate read link
2024-06-25 MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization Adriana Fernandez-Lopez et.al. 2406.17614 translate read null
2024-06-25 High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model Joun Yeop Lee et.al. 2406.17310 translate read null
2024-06-25 A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR Van Tung Pham et.al. 2406.17272 translate read null
2024-06-25 Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation Yingting Li et.al. 2406.17257 translate read null
2024-06-24 Investigating Confidence Estimation Measures for Speaker Diarization Anurag Chowdhury et.al. 2406.17124 translate read null
2024-06-24 Exploring the Capability of Mamba in Speech Applications Koichi Miyazaki et.al. 2406.16808 translate read null
2024-06-24 Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 Sai Koneru et.al. 2406.16777 translate read null
2024-06-25 Towards Zero-Shot Text-To-Speech for Arabic Dialects Khai Duy Doan et.al. 2406.16751 translate read null
2024-06-24 One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection Hyun Myung Kim et.al. 2406.16716 translate read null
2024-06-24 RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging Mingyang Zhang et.al. 2406.16326 translate read null
2024-06-24 DreamVoice: Text-Guided Voice Conversion Jiarui Hai et.al. 2406.16314 translate read null
2024-06-23 Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss Muhammad Shakeel et.al. 2406.16120 translate read null
2024-06-23 Decoder-only Architecture for Streaming End-to-end Speech Recognition Emiru Tsunoo et.al. 2406.16107 translate read null
2024-06-22 Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment Heejin Do et.al. 2406.15723 translate read null
2024-06-21 PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics Amir Nassereldine et.al. 2406.15668 translate read null
2024-06-21 Perception of Phonological Assimilation by Neural Speech Recognition Models Charlotte Pouw et.al. 2406.15265 translate read null
2024-06-21 InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions Yu Nakagome et.al. 2406.14890 translate read null
2024-06-20 An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks Varsha Suresh et.al. 2406.14747 translate read null
2024-06-21 DASB – Discrete Audio and Speech Benchmark Pooneh Mousavi et.al. 2406.14294 translate read null
2024-06-20 Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries Anna Wróblewska et.al. 2406.14266 translate read null
2024-06-19 Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control Alexander Blatt et.al. 2406.13842 translate read null
2024-06-19 ManWav: The First Manchu ASR Model Jean Seo et.al. 2406.13502 translate read null
2024-06-19 Children’s Speech Recognition through Discrete Token Enhancement Vrunda N. Sukhadia et.al. 2406.13431 translate read null
2024-06-19 CEC: A Noisy Label Detection Method for Speaker Recognition Yao Shen et.al. 2406.13268 translate read null
2024-06-18 Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech Cheol Jun Cho et.al. 2406.12998 translate read null
2024-06-18 Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition Kuan-Chen Wang et.al. 2406.12699 translate read null
2024-06-18 Transcribe, Align and Segment: Creating speech datasets for low-resource languages Taras Sereda et.al. 2406.12674 translate read null
2024-06-18 Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech Adrien Pupier et.al. 2406.12621 translate read null
2024-06-18 Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting Yosuke Kashiwagi et.al. 2406.12611 translate read null
2024-06-18 Unsupervised Online Continual Learning for Automatic Speech Recognition Steven Vander Eeckt et.al. 2406.12503 translate read null
2024-06-18 Performant ASR Models for Medical Entities in Accented Speech Tejumade Afonja et.al. 2406.12387 translate read null
2024-06-18 Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model Hayato Futami et.al. 2406.12317 translate read null
2024-06-18 JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning Boyu Chen et.al. 2406.12292 translate read null
2024-06-18 SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization Young Jin Ahn et.al. 2406.12233 translate read null
2024-06-18 A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis Guoqiang Hu et.al. 2406.12164 translate read null
2024-06-17 1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis Sewade Ogun et.al. 2406.11727 translate read null
2024-06-17 GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Yifan Yang et.al. 2406.11546 translate read link
2024-06-17 Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9 Do Hyun Lee et.al. 2406.11248 translate read null
2024-06-17 Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision Yafeng Chen et.al. 2406.11169 translate read null
2024-06-16 Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech Guan-Ting Lin et.al. 2406.11064 translate read null
2024-06-16 NAST: Noise Aware Speech Tokenization for Speech Language Models Shoval Messica et.al. 2406.11037 translate read link
2024-06-16 Large Language Models for Dysfluency Detection in Stuttered Speech Dominik Wagner et.al. 2406.11025 translate read null
2024-06-16 Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models Dominik Wagner et.al. 2406.11022 translate read null
2024-06-16 Optimized Speculative Sampling for GPU Hardware Accelerators Dominik Wagner et.al. 2406.11016 translate read null
2024-06-16 CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving Bhavani Shankar et.al. 2406.10993 translate read null
2024-06-14 Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation Dena Mujtaba et.al. 2406.10177 translate read null
2024-06-14 On the Evaluation of Speech Foundation Models for Spoken Language Understanding Siddhant Arora et.al. 2406.10083 translate read null
2024-06-14 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Andrew Rouditchenko et.al. 2406.10082 translate read link
2024-06-14 Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection Haoyu Wang et.al. 2406.10052 translate read link
2024-06-14 ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR Vishwanath Pratap Singh et.al. 2406.09999 translate read null
2024-06-14 An efficient text augmentation approach for contextualized Mandarin speech recognition Naijun Zheng et.al. 2406.09950 translate read null
2024-06-14 Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition Yicong Jiang et.al. 2406.09873 translate read null
2024-06-14 MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model Jiatong Shi et.al. 2406.09869 translate read null
2024-06-14 Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy Linhan Ma et.al. 2406.09844 translate read null
2024-06-14 Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation Kaien Mo et.al. 2406.09821 translate read null
2024-06-13 Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech Martina Valente et.al. 2406.09290 translate read null
2024-06-13 Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t Chihiro Taguchi et.al. 2406.09202 translate read null
2024-06-13 LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks Amit Meghanani et.al. 2406.09153 translate read null
2024-06-13 ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis Dehua Tao et.al. 2406.08989 translate read null
2024-06-13 Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition William Ravenscroft et.al. 2406.08914 translate read null
2024-06-13 AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers Emil Biju et.al. 2406.08904 translate read null
2024-06-13 A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed Ziyang Zhuang et.al. 2406.08835 translate read null
2024-06-13 Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems Zhengyang Chen et.al. 2406.08812 translate read null
2024-06-12 ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets Jiatong Shi et.al. 2406.08641 translate read null
2024-06-12 Emotion Manipulation Through Music – A Deep Learning Interactive Visual Approach Adel N. Abdalla et.al. 2406.08623 translate read null
2024-06-12 SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models Chun Yin et.al. 2406.08445 translate read null
2024-06-12 TokSing: Singing Voice Synthesis based on Discrete Tokens Yuning Wu et.al. 2406.08416 translate read null
2024-06-12 Neural Blind Source Separation and Diarization for Distant Speech Recognition Yoshiaki Bando et.al. 2406.08396 translate read null
2024-06-12 Towards Unsupervised Speech Recognition Without Pronunciation Models Junrui Ni et.al. 2406.08380 translate read null
2024-06-12 Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques Yuanchao Li et.al. 2406.08353 translate read link
2024-06-12 Refining Self-Supervised Learnt Speech Representation using Brain Activations Hengyu Li et.al. 2406.08266 translate read null
2024-06-12 Transformer-based Model for ASR N-Best Rescoring and Rewriting Iwen E. Kang et.al. 2406.08207 translate read null
2024-06-12 FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter Yuanjun Lv et.al. 2406.08196 translate read link
2024-06-12 Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data Yuma Shirahata et.al. 2406.08111 translate read null
2024-06-12 Can Large Language Models Understand Spatial Audio? Changli Tang et.al. 2406.07914 translate read null
2024-06-11 Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? Qingkai Fang et.al. 2406.07289 translate read null
2024-06-11 Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment Takuto Igarashi et.al. 2406.07280 translate read null
2024-06-11 AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection Rong Gong et.al. 2406.07256 translate read null
2024-06-11 SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark Yuki Saito et.al. 2406.07254 translate read null
2024-06-11 CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems Haibin Wu et.al. 2406.07237 translate read null
2024-06-11 MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms Seung-bin Kim et.al. 2406.07103 translate read link
2024-06-11 Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter Andrei Andrusenko et.al. 2406.07096 translate read null
2024-06-11 Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech Mateusz Czyżnikiewicz et.al. 2406.07090 translate read null
2024-06-11 Reading Miscue Detection in Primary School through Automatic Speech Recognition Lingyun Gao et.al. 2406.07060 translate read null
2024-06-10 Synthetic Query Generation using Large Language Models for Virtual Assistants Sonal Sannigrahi et.al. 2406.06729 translate read null
2024-06-10 Meta Learning Text-to-Speech Synthesis in over 7000 Languages Florian Lux et.al. 2406.06403 translate read link
2024-06-10 A Parameter-efficient Language Extension Framework for Multilingual ASR Wei Liu et.al. 2406.06329 translate read null
2024-06-10 Quantifying the effect of speech pathology on automatic and human speaker verification Bence Mark Halpern et.al. 2406.06208 translate read null
2024-06-10 JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis Hyunjae Cho et.al. 2406.06111 translate read null
2024-06-10 Prompting Large Language Models with Audio for General-Purpose Speech Summarization Wonjune Kang et.al. 2406.05968 translate read link
2024-06-09 Conserving Human Creativity with Evolutionary Generative Algorithms: A Case Study in Music Generation Justin Kilb et.al. 2406.05873 translate read null
2024-06-09 Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels Shlomo Salo Elia et.al. 2406.05863 translate read null
2024-06-09 Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper Chih-Kai Yang et.al. 2406.05806 translate read null
2024-06-09 Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper’s Encoder for Efficient Parameter Reduction in Automated Assessment Huma Ameer et.al. 2406.05784 translate read null
2024-06-09 SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion Bingsong Bai et.al. 2406.05692 translate read null
2024-06-07 The Database and Benchmark for Source Speaker Verification Against Voice Conversion Ze Li et.al. 2406.04951 translate read null
2024-06-07 LLM-based speaker diarization correction: A generalizable approach Georgios Efstathiadis et.al. 2406.04927 translate read link
2024-06-07 Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR Shaojun Li et.al. 2406.04791 translate read null
2024-06-07 Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis Xintong Wang et.al. 2406.04595 translate read null
2024-06-07 Neural Codec-based Adversarial Sample Detection for Speaker Verification Xuanjun Chen et.al. 2406.04582 translate read null
2024-06-06 Flexible Multichannel Speech Enhancement for Noise-Robust Frontend Ante Jukić et.al. 2406.04552 translate read null
2024-06-06 Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation Keqi Deng et.al. 2406.04541 translate read null
2024-06-06 To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation Abdul Waheed et.al. 2406.04512 translate read link
2024-06-06 Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline Ali N. Salman et.al. 2406.04494 translate read null
2024-06-06 Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Théodor Lemerle et.al. 2406.04467 translate read link
2024-06-06 VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling Zeyue Tian et.al. 2406.04321 translate read link
2024-06-06 Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement Wangyou Zhang et.al. 2406.04269 translate read null
2024-06-06 Hypernetworks for Personalizing ASR to Atypical Speech Max Mueller-Eberstein et.al. 2406.04240 translate read null
2024-06-06 Helsinki Speech Challenge 2024 Martin Ludvigsen et.al. 2406.04123 translate read null
2024-06-06 BLSP-Emo: Towards Empathetic Large Speech-Language Models Chen Wang et.al. 2406.03872 translate read link
2024-06-06 Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores Jiaming Zhou et.al. 2406.03814 translate read null
2024-06-06 Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU Daniel Galvez et.al. 2406.03791 translate read null
2024-06-06 Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining Jinlong Xue et.al. 2406.03714 translate read null
2024-06-06 Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model Jinlong Xue et.al. 2406.03706 translate read null
2024-06-05 Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid et.al. 2406.03637 translate read null
2024-06-05 Enhancing CTC-based speech recognition with diverse modeling units Shiyi Han et.al. 2406.03274 translate read null
2024-06-05 Error-preserving Automatic Speech Recognition of Young English Learners’ Language Janick Michot et.al. 2406.03235 translate read link
2024-06-05 StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Shaolei Zhang et.al. 2406.03049 translate read link
2024-06-05 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders Yui Sudo et.al. 2406.02950 translate read null
2024-06-05 SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation Hsuan Su et.al. 2406.02925 translate read null
2024-06-05 Text Injection for Neural Contextual Biasing Zhong Meng et.al. 2406.02921 translate read null
2024-06-04 Keyword-Guided Adaptation of Automatic Speech Recognition Aviv Shamsian et.al. 2406.02649 translate read null
2024-06-04 Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion Ruiqi Li et.al. 2406.02429 translate read null
2024-06-04 An Independence-promoting Loss for Music Generation with Language Models Jean-Marie Lemercier et.al. 2406.02315 translate read null
2024-06-04 Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models Victor Miara et.al. 2406.02285 translate read link
2024-06-04 ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency Yafeng Chen et.al. 2406.02167 translate read null
2024-06-04 Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision Saierdaer Yusuyin et.al. 2406.02166 translate read link
2024-06-04 Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis Kun Zhou et.al. 2406.02009 translate read null
2024-06-04 Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping Lun Wang et.al. 2406.02004 translate read null
2024-06-03 TinySV: Speaker Verification in TinyML with On-device Learning Massimo Pavan et.al. 2406.01655 translate read null
2024-06-03 Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach Ara Yeroyan et.al. 2406.01446 translate read null
2024-06-03 Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization Firas Khader et.al. 2406.01314 translate read null

(<a href=../Audio_Processing.md>back to Audio Processing</a>)