Multimodal - 2025-09

Publish Date Title Authors PDF Translate Read Code
2025-09-30 MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation Md Zubair et.al. 2510.07328 translate read null
2025-09-25 Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data Jiancheng Zhang et.al. 2510.03247 translate read null
2025-09-30 MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning Seong-Hyeon Hwang et.al. 2509.25831 translate read null
2025-09-30 ProbMed: A Probabilistic Framework for Medical Multimodal Binding Yuan Gao et.al. 2509.25711 translate read null
2025-09-30 Massively Multimodal Foundation Models: A Framework for Capturing Dependencies with Specialized Mixture-of-Experts Xing Han et.al. 2509.25678 translate read null
2025-09-30 Generalized Contrastive Learning for Universal Multimodal Retrieval Jungsoo Lee et.al. 2509.25638 translate read null
2025-09-29 FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology Faizan Farooq Khan et.al. 2509.25564 translate read null
2025-09-29 MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series Payal Mohapatra et.al. 2509.25278 translate read null
2025-09-29 A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity Giordano Cicchetti et.al. 2509.24734 translate read null
2025-09-29 Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey Yuntao Shou et.al. 2509.24322 translate read null
2025-09-28 Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics Luxuan Zhang et.al. 2509.23543 translate read null
2025-09-26 RefAM: Attention Magnets for Zero-Shot Referral Segmentation Anna Kukleva et.al. 2509.22650 translate read null
2025-09-26 HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes Katrina Ashton et.al. 2509.22498 translate read null
2025-09-26 From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment Ke Ye et.al. 2509.22205 translate read null
2025-09-26 WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM Changli Tang et.al. 2509.21990 translate read null
2025-09-26 VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation Huayi Zhou et.al. 2509.21723 translate read null
2025-09-25 Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations Zhijian Yang et.al. 2509.21249 translate read null
2025-09-25 SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization Jiehui Luo et.al. 2509.21033 translate read null
2025-09-14 LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition Zejun Liu et.al. 2509.19330 translate read null
2025-09-10 Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning Yiqiao Chen et.al. 2509.19315 translate read null
2025-09-23 Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation Christian Ganhör et.al. 2509.18807 translate read null
2025-09-23 M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition Jiajun He et.al. 2509.18706 translate read null
2025-09-22 Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction Yi Gu et.al. 2509.18284 translate read null
2025-09-22 ClassMind: Scaling Classroom Observation and Instructional Feedback with Multimodal AI Ao Qu et.al. 2509.18020 translate read null
2025-09-22 M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer Yanxin Zhang et.al. 2509.18005 translate read null
2025-09-22 Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training Divya Mereddy et.al. 2509.17888 translate read null
2025-09-22 MLLM-Driven Semantic Identifier Generation for Generative Cross-Modal Retrieval Tianyuan Li et.al. 2509.17359 translate read null
2025-09-20 Self-organized epithelial reticulum inhibits cell proliferation Liav Daraf et.al. 2509.16661 translate read null
2025-09-19 Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation Weimin Bai et.al. 2509.15772 translate read null
2025-09-19 Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion Shanghong Li et.al. 2509.15578 translate read null
2025-09-19 Beyond Words: Enhancing Desire, Emotion, and Sentiment Recognition with Non-Verbal Cues Wei Chen et.al. 2509.15540 translate read null
2025-09-17 Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays Hanbin Ko et.al. 2509.15234 translate read null
2025-09-17 VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI Daiqi Liu et.al. 2509.13767 translate read null
2025-09-15 Evaluating Robustness of Vision-Language Models Under Noisy Conditions Purushoth et.al. 2509.12492 translate read null
2025-09-15 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Yang Zhou et.al. 2509.12201 translate read link
2025-09-15 Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI Bo Cao et.al. 2509.11924 translate read null
2025-09-14 GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration Wan Xu et.al. 2509.11360 translate read null
2025-09-14 DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations Doan Minh Trung et.al. 2509.11187 translate read null
2025-09-14 Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation Nhi Kieu et.al. 2509.11102 translate read null
2025-09-13 Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction Yi Lu et.al. 2509.10802 translate read null
2025-09-11 Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training Anthony P. Addison et.al. 2509.09290 translate read null
2025-09-09 Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review Alvaro Becerra et.al. 2509.07742 translate read null
2025-09-08 Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding Jiangnan Xie et.al. 2509.06291 translate read null
2025-09-06 GraMFedDHAR: Graph Based Multimodal Differentially Private Federated HAR Labani Halder et.al. 2509.05671 translate read null
2025-09-06 Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities Xiaoguang Zhu et.al. 2509.05615 translate read null
2025-09-04 Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models Kimia Ehsani et.al. 2509.03837 translate read null
2025-09-03 Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support Eduardo Davalos et.al. 2509.03741 translate read null
2025-09-03 Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning Duy A. Nguyen et.al. 2509.03477 translate read null
2025-09-03 Multimodal learning of melt pool dynamics in laser powder bed fusion Satyajit Mojumder et.al. 2509.03029 translate read null
2025-09-03 Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability Shuai Jiang et.al. 2509.02962 translate read null
2025-09-02 Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception Changshi Zhou et.al. 2509.02324 translate read null
2025-09-02 Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective Shijie Wang et.al. 2509.02281 translate read null
2025-09-02 Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic Nirmalya Thakur et.al. 2509.01954 translate read null
2025-09-01 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Yanqing Liu et.al. 2509.01644 translate read link
2025-09-01 Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement Jiayi Gao et.al. 2509.01362 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)