Multimodal - 2026-01

Publish Date Title Authors PDF Translate Read Code
2026-01-30 Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals Mathew Chandy et.al. 2602.00171 translate read null
2026-01-30 RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation Jiaxuan Luo et.al. 2601.22777 translate read null
2026-01-29 Neural Signals Generate Clinical Notes in the Wild Jathurshan Pradeepkumar et.al. 2601.22197 translate read null
2026-01-29 MEIDNet: Multimodal generative AI framework for inverse materials design Anand Babu et.al. 2601.22009 translate read null
2026-01-29 Embracing Aleatoric Uncertainty in Medical Multimodal Learning with Missing Modalities Linxiao Gong et.al. 2601.21950 translate read null
2026-01-29 Robust Multimodal Representation Learning in Healthcare Xiaoguang Zhu et.al. 2601.21941 translate read null
2026-01-29 When Gradient Optimization Is Not Enough: $\dagger$ Dispersive and Anchoring Geometric Regularizer for Multimodal Learning Zixuan Xia et.al. 2601.21670 translate read null
2026-01-29 MultiModal Fine-tuning with Synthetic Captions Shohei Enomoto et.al. 2601.21426 translate read null
2026-01-29 Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning Anders Gjølbye et.al. 2601.21294 translate read null
2026-01-27 GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining Shentong Mo et.al. 2601.19606 translate read null
2026-01-27 TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment Jiarun Liu et.al. 2601.19247 translate read null
2026-01-26 AGSP-DSA: An Adaptive Graph Signal Processing Framework for Robust Multimodal Fusion with Dynamic Semantic Alignment KV Karthikeya et.al. 2601.18589 translate read null
2026-01-26 Closing the Modality Gap Aligns Group-Wise Semantics Eleonora Grassucci et.al. 2601.18525 translate read null
2026-01-23 Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding Xiaojiang Peng et.al. 2601.16449 translate read null
2026-01-21 LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding Xiaodong Wang et.al. 2601.15016 translate read null
2026-01-21 Citation of scientific evidence from video description and its association with attention and impact Pablo Dorta-González et.al. 2601.14916 translate read null
2026-01-20 DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning Abdurrahim Yilmaz et.al. 2601.14084 translate read null
2026-01-20 Face-Voice Association with Inductive Bias for Maximum Class Separation Marta Moscati et.al. 2601.13651 translate read null
2026-01-20 DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities Nhi Kieu et.al. 2601.13502 translate read null
2026-01-16 Generative Scenario Rollouts for End-to-End Autonomous Driving Rajeev Yasarla et.al. 2601.11475 translate read null
2026-01-16 Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation Ji Dai et.al. 2601.11151 translate read null
2026-01-15 DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Hengyu Shen et.al. 2601.10305 translate read link
2026-01-15 V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation Han Wang et.al. 2601.10094 translate read null
2026-01-14 Personalized Multimodal Feedback Using Multiple External Representations: Strategy Profiles and Learning in High School Physics Natalia Revenga-Lozano et.al. 2601.09470 translate read null
2026-01-13 Edge-Optimized Multimodal Learning for UAV Video Understanding via BLIP-2 Yizhan Feng et.al. 2601.08408 translate read null
2026-01-09 Feature Entanglement-based Quantum Multimodal Fusion Neural Network Yu Wu et.al. 2601.07856 translate read null
2026-01-12 A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data Alvaro Becerra et.al. 2601.07576 translate read null
2026-01-12 Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance Jongwon Ryu et.al. 2601.07221 translate read null
2026-01-12 Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification Shu Shen et.al. 2601.07163 translate read null
2026-01-11 CLIMP: Contrastive Language-Image Mamba Pretraining Nimrod Shabtay et.al. 2601.06891 translate read null
2026-01-11 Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature Malavika Pradeep et.al. 2601.06792 translate read null
2026-01-05 Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations Rohit Kaushik et.al. 2601.06140 translate read null
2026-01-08 Multi-task Cross-modal Learning for Chest X-ray Image Retrieval Zhaohui Liang et.al. 2601.05399 translate read null
2026-01-08 Advanced Multimodal Learning for Seizure Detection and Prediction: Concept, Challenges, and Future Directions Ijaz Ahmad et.al. 2601.05095 translate read null
2026-01-08 The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms Lingdong Kong et.al. 2601.05014 translate read null
2026-01-08 MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration Lei Xu et.al. 2601.04801 translate read null
2026-01-06 Attention mechanisms in neural networks Hasi Hays et.al. 2601.03329 translate read null
2026-01-04 Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning Weihang You et.al. 2601.01339 translate read null
2026-01-02 Wave2Word: A Multimodal Transformer Framework for Joint EEG-Text Alignment and Multi-Task Representation Learning in Neurocritical Care Argha Kamal Samanta et.al. 2601.00670 translate read null
2026-01-01 S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding He Wang et.al. 2601.00264 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)