Multimodal - 2026-01
Multimodal - 2026-01
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2026-01-30 | Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals | Mathew Chandy et.al. | 2602.00171 | translate | read | null |
| 2026-01-30 | RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation | Jiaxuan Luo et.al. | 2601.22777 | translate | read | null |
| 2026-01-29 | Neural Signals Generate Clinical Notes in the Wild | Jathurshan Pradeepkumar et.al. | 2601.22197 | translate | read | null |
| 2026-01-29 | MEIDNet: Multimodal generative AI framework for inverse materials design | Anand Babu et.al. | 2601.22009 | translate | read | null |
| 2026-01-29 | Embracing Aleatoric Uncertainty in Medical Multimodal Learning with Missing Modalities | Linxiao Gong et.al. | 2601.21950 | translate | read | null |
| 2026-01-29 | Robust Multimodal Representation Learning in Healthcare | Xiaoguang Zhu et.al. | 2601.21941 | translate | read | null |
| 2026-01-29 | When Gradient Optimization Is Not Enough: $\dagger$ Dispersive and Anchoring Geometric Regularizer for Multimodal Learning | Zixuan Xia et.al. | 2601.21670 | translate | read | null |
| 2026-01-29 | MultiModal Fine-tuning with Synthetic Captions | Shohei Enomoto et.al. | 2601.21426 | translate | read | null |
| 2026-01-29 | Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning | Anders Gjølbye et.al. | 2601.21294 | translate | read | null |
| 2026-01-27 | GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining | Shentong Mo et.al. | 2601.19606 | translate | read | null |
| 2026-01-27 | TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment | Jiarun Liu et.al. | 2601.19247 | translate | read | null |
| 2026-01-26 | AGSP-DSA: An Adaptive Graph Signal Processing Framework for Robust Multimodal Fusion with Dynamic Semantic Alignment | KV Karthikeya et.al. | 2601.18589 | translate | read | null |
| 2026-01-26 | Closing the Modality Gap Aligns Group-Wise Semantics | Eleonora Grassucci et.al. | 2601.18525 | translate | read | null |
| 2026-01-23 | Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding | Xiaojiang Peng et.al. | 2601.16449 | translate | read | null |
| 2026-01-21 | LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding | Xiaodong Wang et.al. | 2601.15016 | translate | read | null |
| 2026-01-21 | Citation of scientific evidence from video description and its association with attention and impact | Pablo Dorta-González et.al. | 2601.14916 | translate | read | null |
| 2026-01-20 | DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning | Abdurrahim Yilmaz et.al. | 2601.14084 | translate | read | null |
| 2026-01-20 | Face-Voice Association with Inductive Bias for Maximum Class Separation | Marta Moscati et.al. | 2601.13651 | translate | read | null |
| 2026-01-20 | DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities | Nhi Kieu et.al. | 2601.13502 | translate | read | null |
| 2026-01-16 | Generative Scenario Rollouts for End-to-End Autonomous Driving | Rajeev Yasarla et.al. | 2601.11475 | translate | read | null |
| 2026-01-16 | Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation | Ji Dai et.al. | 2601.11151 | translate | read | null |
| 2026-01-15 | DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset | Hengyu Shen et.al. | 2601.10305 | translate | read | link |
| 2026-01-15 | V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation | Han Wang et.al. | 2601.10094 | translate | read | null |
| 2026-01-14 | Personalized Multimodal Feedback Using Multiple External Representations: Strategy Profiles and Learning in High School Physics | Natalia Revenga-Lozano et.al. | 2601.09470 | translate | read | null |
| 2026-01-13 | Edge-Optimized Multimodal Learning for UAV Video Understanding via BLIP-2 | Yizhan Feng et.al. | 2601.08408 | translate | read | null |
| 2026-01-09 | Feature Entanglement-based Quantum Multimodal Fusion Neural Network | Yu Wu et.al. | 2601.07856 | translate | read | null |
| 2026-01-12 | A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data | Alvaro Becerra et.al. | 2601.07576 | translate | read | null |
| 2026-01-12 | Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance | Jongwon Ryu et.al. | 2601.07221 | translate | read | null |
| 2026-01-12 | Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification | Shu Shen et.al. | 2601.07163 | translate | read | null |
| 2026-01-11 | CLIMP: Contrastive Language-Image Mamba Pretraining | Nimrod Shabtay et.al. | 2601.06891 | translate | read | null |
| 2026-01-11 | Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature | Malavika Pradeep et.al. | 2601.06792 | translate | read | null |
| 2026-01-05 | Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations | Rohit Kaushik et.al. | 2601.06140 | translate | read | null |
| 2026-01-08 | Multi-task Cross-modal Learning for Chest X-ray Image Retrieval | Zhaohui Liang et.al. | 2601.05399 | translate | read | null |
| 2026-01-08 | Advanced Multimodal Learning for Seizure Detection and Prediction: Concept, Challenges, and Future Directions | Ijaz Ahmad et.al. | 2601.05095 | translate | read | null |
| 2026-01-08 | The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms | Lingdong Kong et.al. | 2601.05014 | translate | read | null |
| 2026-01-08 | MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration | Lei Xu et.al. | 2601.04801 | translate | read | null |
| 2026-01-06 | Attention mechanisms in neural networks | Hasi Hays et.al. | 2601.03329 | translate | read | null |
| 2026-01-04 | Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning | Weihang You et.al. | 2601.01339 | translate | read | null |
| 2026-01-02 | Wave2Word: A Multimodal Transformer Framework for Joint EEG-Text Alignment and Multi-Task Representation Learning in Neurocritical Care | Argha Kamal Samanta et.al. | 2601.00670 | translate | read | null |
| 2026-01-01 | S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding | He Wang et.al. | 2601.00264 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)