Multimodal - 2025-05

Publish Date Title Authors PDF Translate Read Code
2025-05-30 Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework Can Polat et.al. 2506.00302 translate read null
2025-05-30 Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts Xin He et.al. 2505.24541 translate read null
2025-05-29 Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition Sean Foley et.al. 2505.24059 translate read null
2025-05-29 OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data Fengxiang Wang et.al. 2505.23522 translate read null
2025-05-29 Bidirectional predictive coding Gaspard Oliviers et.al. 2505.23415 translate read null
2025-05-29 Deep Modeling and Optimization of Medical Image Classification Yihang Wu et.al. 2505.23040 translate read link
2025-05-30 EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations Haoqin Sun et.al. 2505.23018 translate read link
2025-05-27 A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features Ihab Bendidi et.al. 2505.21317 translate read null
2025-05-26 Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects Chengyan Wu et.al. 2505.20511 translate read null
2025-05-25 PDFBench: A Benchmark for De novo Protein Design from Function Jiahao Kuang et.al. 2505.20346 translate read null
2025-05-26 Learning Optimal Multimodal Information Bottleneck Representations Qilong Wu et.al. 2505.19996 translate read null
2025-05-26 ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs Pooneh Mousavi et.al. 2505.19937 translate read null
2025-05-26 Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning Sanghyuk Chun et.al. 2505.19614 translate read null
2025-05-26 Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate Liangwei Nathan Zheng et.al. 2505.19525 translate read null
2025-05-25 Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding Shiyue Wang et.al. 2505.19219 translate read null
2025-05-25 I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin et.al. 2505.19190 translate read link
2025-05-23 Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation Zhihua Liu et.al. 2505.17994 translate read null
2025-05-23 HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning Chuhao Zhou et.al. 2505.17645 translate read null
2025-05-23 RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition Yuehan Jin et.al. 2505.17501 translate read null
2025-05-21 NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation Weiming Wu et.al. 2505.17121 translate read null
2025-05-22 ICYM2I: The illusion of multimodal informativeness under missingness Young Sang Choi et.al. 2505.16953 translate read link
2025-05-22 Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports Francesco Dalla Serra et.al. 2505.16624 translate read null
2025-05-22 Multimodal Online Federated Learning with Modality Missing in Internet of Things Heqiang Wang et.al. 2505.16138 translate read null
2025-05-21 Robust Multimodal Learning via Entropy-Gated Contrastive Fusion Leon Chlon et.al. 2505.15417 translate read null
2025-05-21 EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy Chi Kit Ng et.al. 2505.15206 translate read null
2025-05-21 Graph Foundation Models: A Comprehensive Survey Zehong Wang et.al. 2505.15116 translate read link
2025-05-19 HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity Xuejun Sun et.al. 2505.14725 translate read link
2025-05-20 Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning Jiangrong Shen et.al. 2505.14535 translate read null
2025-05-20 Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition Shuo Zhang et.al. 2505.14143 translate read null
2025-05-20 LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts Qifeng Cai et.al. 2505.13928 translate read link
2025-05-17 Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering Hessa Alawwad et.al. 2505.13520 translate read null
2025-05-19 AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning Kai Zhang et.al. 2505.12782 translate read null
2025-05-19 PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI Yingchen He et.al. 2505.12707 translate read null
2025-05-17 Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding Can Polat et.al. 2505.12137 translate read null
2025-05-17 SafeVid: Toward Safety Aligned Video Large Multimodal Models Yixu Wang et.al. 2505.11926 translate read null
2025-05-16 GeoMM: On Geodesic Perspective for Multi-modal Learning Shibin Mei et.al. 2505.11216 translate read null
2025-05-15 Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence Xiang He et.al. 2505.10176 translate read link
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Chaofan Zhang et.al. 2505.09577 translate read null
2025-05-16 Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora Michael Majurski et.al. 2505.08905 translate read link
2025-05-13 Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities Jueqing Lu et.al. 2505.08283 translate read null
2025-05-11 MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning Lishan Yang et.al. 2505.06911 translate read null
2025-05-10 Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning H M Dipu Kabir et.al. 2505.06592 translate read link
2025-05-10 TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition Feng Liu et.al. 2505.06536 translate read link
2025-05-09 NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines Chathurangi Shyalika et.al. 2505.06333 translate read link
2025-05-09 Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models Jugal Gajjar et.al. 2505.06110 translate read null
2025-05-09 Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects Tobias Preintner et.al. 2505.06030 translate read link
2025-05-08 The Moon’s Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction Tom Sander et.al. 2505.05644 translate read null
2025-05-07 OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Xianhang Li et.al. 2505.04601 translate read link
2025-05-02 Mapping the Climate Change Landscape on TikTok Alessia Galdeman et.al. 2505.03813 translate read null
2025-05-06 Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant Haonan Wang et.al. 2505.03380 translate read null
2025-05-06 A Vision-Language Model for Focal Liver Lesion Classification Song Jian et.al. 2505.03350 translate read null
2025-05-06 SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation Yu-Ren Guo et.al. 2505.03244 translate read null
2025-05-05 The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI Kishore Sampath et.al. 2505.03020 translate read null
2025-05-02 Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders Rogelio A Mancisidor et.al. 2505.01134 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)