Multimodal - 2025-09 | Paper Arxiv Daily

Multimodal - 2025-09

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-09-30	MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation	Md Zubair et.al.	2510.07328	translate	read	null
2025-09-25	Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data	Jiancheng Zhang et.al.	2510.03247	translate	read	null
2025-09-30	MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning	Seong-Hyeon Hwang et.al.	2509.25831	translate	read	null
2025-09-30	ProbMed: A Probabilistic Framework for Medical Multimodal Binding	Yuan Gao et.al.	2509.25711	translate	read	null
2025-09-30	Massively Multimodal Foundation Models: A Framework for Capturing Dependencies with Specialized Mixture-of-Experts	Xing Han et.al.	2509.25678	translate	read	null
2025-09-30	Generalized Contrastive Learning for Universal Multimodal Retrieval	Jungsoo Lee et.al.	2509.25638	translate	read	null
2025-09-29	FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology	Faizan Farooq Khan et.al.	2509.25564	translate	read	null
2025-09-29	MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series	Payal Mohapatra et.al.	2509.25278	translate	read	null
2025-09-29	A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity	Giordano Cicchetti et.al.	2509.24734	translate	read	null
2025-09-29	Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey	Yuntao Shou et.al.	2509.24322	translate	read	null
2025-09-28	Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics	Luxuan Zhang et.al.	2509.23543	translate	read	null
2025-09-26	RefAM: Attention Magnets for Zero-Shot Referral Segmentation	Anna Kukleva et.al.	2509.22650	translate	read	null
2025-09-26	HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes	Katrina Ashton et.al.	2509.22498	translate	read	null
2025-09-26	From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment	Ke Ye et.al.	2509.22205	translate	read	null
2025-09-26	WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM	Changli Tang et.al.	2509.21990	translate	read	null
2025-09-26	VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation	Huayi Zhou et.al.	2509.21723	translate	read	null
2025-09-25	Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations	Zhijian Yang et.al.	2509.21249	translate	read	null
2025-09-25	SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization	Jiehui Luo et.al.	2509.21033	translate	read	null
2025-09-14	LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition	Zejun Liu et.al.	2509.19330	translate	read	null
2025-09-10	Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning	Yiqiao Chen et.al.	2509.19315	translate	read	null
2025-09-23	Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation	Christian Ganhör et.al.	2509.18807	translate	read	null
2025-09-23	M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition	Jiajun He et.al.	2509.18706	translate	read	null
2025-09-22	Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction	Yi Gu et.al.	2509.18284	translate	read	null
2025-09-22	ClassMind: Scaling Classroom Observation and Instructional Feedback with Multimodal AI	Ao Qu et.al.	2509.18020	translate	read	null
2025-09-22	M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer	Yanxin Zhang et.al.	2509.18005	translate	read	null
2025-09-22	Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training	Divya Mereddy et.al.	2509.17888	translate	read	null
2025-09-22	MLLM-Driven Semantic Identifier Generation for Generative Cross-Modal Retrieval	Tianyuan Li et.al.	2509.17359	translate	read	null
2025-09-20	Self-organized epithelial reticulum inhibits cell proliferation	Liav Daraf et.al.	2509.16661	translate	read	null
2025-09-19	Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation	Weimin Bai et.al.	2509.15772	translate	read	null
2025-09-19	Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion	Shanghong Li et.al.	2509.15578	translate	read	null
2025-09-19	Beyond Words: Enhancing Desire, Emotion, and Sentiment Recognition with Non-Verbal Cues	Wei Chen et.al.	2509.15540	translate	read	null
2025-09-17	Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays	Hanbin Ko et.al.	2509.15234	translate	read	null
2025-09-17	VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI	Daiqi Liu et.al.	2509.13767	translate	read	null
2025-09-15	Evaluating Robustness of Vision-Language Models Under Noisy Conditions	Purushoth et.al.	2509.12492	translate	read	null
2025-09-15	OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling	Yang Zhou et.al.	2509.12201	translate	read	link
2025-09-15	Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI	Bo Cao et.al.	2509.11924	translate	read	null
2025-09-14	GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration	Wan Xu et.al.	2509.11360	translate	read	null
2025-09-14	DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations	Doan Minh Trung et.al.	2509.11187	translate	read	null
2025-09-14	Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation	Nhi Kieu et.al.	2509.11102	translate	read	null
2025-09-13	Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction	Yi Lu et.al.	2509.10802	translate	read	null
2025-09-11	Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training	Anthony P. Addison et.al.	2509.09290	translate	read	null
2025-09-09	Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review	Alvaro Becerra et.al.	2509.07742	translate	read	null
2025-09-08	Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding	Jiangnan Xie et.al.	2509.06291	translate	read	null
2025-09-06	GraMFedDHAR: Graph Based Multimodal Differentially Private Federated HAR	Labani Halder et.al.	2509.05671	translate	read	null
2025-09-06	Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities	Xiaoguang Zhu et.al.	2509.05615	translate	read	null
2025-09-04	Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models	Kimia Ehsani et.al.	2509.03837	translate	read	null
2025-09-03	Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support	Eduardo Davalos et.al.	2509.03741	translate	read	null
2025-09-03	Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning	Duy A. Nguyen et.al.	2509.03477	translate	read	null
2025-09-03	Multimodal learning of melt pool dynamics in laser powder bed fusion	Satyajit Mojumder et.al.	2509.03029	translate	read	null
2025-09-03	Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability	Shuai Jiang et.al.	2509.02962	translate	read	null
2025-09-02	Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception	Changshi Zhou et.al.	2509.02324	translate	read	null
2025-09-02	Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective	Shijie Wang et.al.	2509.02281	translate	read	null
2025-09-02	Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic	Nirmalya Thakur et.al.	2509.01954	translate	read	null
2025-09-01	OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning	Yanqing Liu et.al.	2509.01644	translate	read	link
2025-09-01	Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement	Jiayi Gao et.al.	2509.01362	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)