Multimodal - 2026-01 | Paper Arxiv Daily

Multimodal - 2026-01

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-01-30	Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals	Mathew Chandy et.al.	2602.00171	translate	read	null
2026-01-30	RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation	Jiaxuan Luo et.al.	2601.22777	translate	read	null
2026-01-29	Neural Signals Generate Clinical Notes in the Wild	Jathurshan Pradeepkumar et.al.	2601.22197	translate	read	null
2026-01-29	MEIDNet: Multimodal generative AI framework for inverse materials design	Anand Babu et.al.	2601.22009	translate	read	null
2026-01-29	Embracing Aleatoric Uncertainty in Medical Multimodal Learning with Missing Modalities	Linxiao Gong et.al.	2601.21950	translate	read	null
2026-01-29	Robust Multimodal Representation Learning in Healthcare	Xiaoguang Zhu et.al.	2601.21941	translate	read	null
2026-01-29	When Gradient Optimization Is Not Enough: $\dagger$ Dispersive and Anchoring Geometric Regularizer for Multimodal Learning	Zixuan Xia et.al.	2601.21670	translate	read	null
2026-01-29	MultiModal Fine-tuning with Synthetic Captions	Shohei Enomoto et.al.	2601.21426	translate	read	null
2026-01-29	Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning	Anders Gjølbye et.al.	2601.21294	translate	read	null
2026-01-27	GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining	Shentong Mo et.al.	2601.19606	translate	read	null
2026-01-27	TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment	Jiarun Liu et.al.	2601.19247	translate	read	null
2026-01-26	AGSP-DSA: An Adaptive Graph Signal Processing Framework for Robust Multimodal Fusion with Dynamic Semantic Alignment	KV Karthikeya et.al.	2601.18589	translate	read	null
2026-01-26	Closing the Modality Gap Aligns Group-Wise Semantics	Eleonora Grassucci et.al.	2601.18525	translate	read	null
2026-01-23	Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding	Xiaojiang Peng et.al.	2601.16449	translate	read	null
2026-01-21	LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding	Xiaodong Wang et.al.	2601.15016	translate	read	null
2026-01-21	Citation of scientific evidence from video description and its association with attention and impact	Pablo Dorta-González et.al.	2601.14916	translate	read	null
2026-01-20	DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning	Abdurrahim Yilmaz et.al.	2601.14084	translate	read	null
2026-01-20	Face-Voice Association with Inductive Bias for Maximum Class Separation	Marta Moscati et.al.	2601.13651	translate	read	null
2026-01-20	DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities	Nhi Kieu et.al.	2601.13502	translate	read	null
2026-01-16	Generative Scenario Rollouts for End-to-End Autonomous Driving	Rajeev Yasarla et.al.	2601.11475	translate	read	null
2026-01-16	Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation	Ji Dai et.al.	2601.11151	translate	read	null
2026-01-15	DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset	Hengyu Shen et.al.	2601.10305	translate	read	link
2026-01-15	V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation	Han Wang et.al.	2601.10094	translate	read	null
2026-01-14	Personalized Multimodal Feedback Using Multiple External Representations: Strategy Profiles and Learning in High School Physics	Natalia Revenga-Lozano et.al.	2601.09470	translate	read	null
2026-01-13	Edge-Optimized Multimodal Learning for UAV Video Understanding via BLIP-2	Yizhan Feng et.al.	2601.08408	translate	read	null
2026-01-09	Feature Entanglement-based Quantum Multimodal Fusion Neural Network	Yu Wu et.al.	2601.07856	translate	read	null
2026-01-12	A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data	Alvaro Becerra et.al.	2601.07576	translate	read	null
2026-01-12	Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance	Jongwon Ryu et.al.	2601.07221	translate	read	null
2026-01-12	Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification	Shu Shen et.al.	2601.07163	translate	read	null
2026-01-11	CLIMP: Contrastive Language-Image Mamba Pretraining	Nimrod Shabtay et.al.	2601.06891	translate	read	null
2026-01-11	Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature	Malavika Pradeep et.al.	2601.06792	translate	read	null
2026-01-05	Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations	Rohit Kaushik et.al.	2601.06140	translate	read	null
2026-01-08	Multi-task Cross-modal Learning for Chest X-ray Image Retrieval	Zhaohui Liang et.al.	2601.05399	translate	read	null
2026-01-08	Advanced Multimodal Learning for Seizure Detection and Prediction: Concept, Challenges, and Future Directions	Ijaz Ahmad et.al.	2601.05095	translate	read	null
2026-01-08	The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms	Lingdong Kong et.al.	2601.05014	translate	read	null
2026-01-08	MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration	Lei Xu et.al.	2601.04801	translate	read	null
2026-01-06	Attention mechanisms in neural networks	Hasi Hays et.al.	2601.03329	translate	read	null
2026-01-04	Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning	Weihang You et.al.	2601.01339	translate	read	null
2026-01-02	Wave2Word: A Multimodal Transformer Framework for Joint EEG-Text Alignment and Multi-Task Representation Learning in Neurocritical Care	Argha Kamal Samanta et.al.	2601.00670	translate	read	null
2026-01-01	S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding	He Wang et.al.	2601.00264	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)