Multimodal - 2025-05
Multimodal - 2025-05
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-05-30 | Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework | Can Polat et.al. | 2506.00302 | translate | read | null |
| 2025-05-30 | Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts | Xin He et.al. | 2505.24541 | translate | read | null |
| 2025-05-29 | Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition | Sean Foley et.al. | 2505.24059 | translate | read | null |
| 2025-05-29 | OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data | Fengxiang Wang et.al. | 2505.23522 | translate | read | null |
| 2025-05-29 | Bidirectional predictive coding | Gaspard Oliviers et.al. | 2505.23415 | translate | read | null |
| 2025-05-29 | Deep Modeling and Optimization of Medical Image Classification | Yihang Wu et.al. | 2505.23040 | translate | read | link |
| 2025-05-30 | EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations | Haoqin Sun et.al. | 2505.23018 | translate | read | link |
| 2025-05-27 | A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features | Ihab Bendidi et.al. | 2505.21317 | translate | read | null |
| 2025-05-26 | Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects | Chengyan Wu et.al. | 2505.20511 | translate | read | null |
| 2025-05-25 | PDFBench: A Benchmark for De novo Protein Design from Function | Jiahao Kuang et.al. | 2505.20346 | translate | read | null |
| 2025-05-26 | Learning Optimal Multimodal Information Bottleneck Representations | Qilong Wu et.al. | 2505.19996 | translate | read | null |
| 2025-05-26 | ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs | Pooneh Mousavi et.al. | 2505.19937 | translate | read | null |
| 2025-05-26 | Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning | Sanghyuk Chun et.al. | 2505.19614 | translate | read | null |
| 2025-05-26 | Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate | Liangwei Nathan Zheng et.al. | 2505.19525 | translate | read | null |
| 2025-05-25 | Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding | Shiyue Wang et.al. | 2505.19219 | translate | read | null |
| 2025-05-25 | I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | Jiayi Xin et.al. | 2505.19190 | translate | read | link |
| 2025-05-23 | Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation | Zhihua Liu et.al. | 2505.17994 | translate | read | null |
| 2025-05-23 | HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning | Chuhao Zhou et.al. | 2505.17645 | translate | read | null |
| 2025-05-23 | RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition | Yuehan Jin et.al. | 2505.17501 | translate | read | null |
| 2025-05-21 | NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation | Weiming Wu et.al. | 2505.17121 | translate | read | null |
| 2025-05-22 | ICYM2I: The illusion of multimodal informativeness under missingness | Young Sang Choi et.al. | 2505.16953 | translate | read | link |
| 2025-05-22 | Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports | Francesco Dalla Serra et.al. | 2505.16624 | translate | read | null |
| 2025-05-22 | Multimodal Online Federated Learning with Modality Missing in Internet of Things | Heqiang Wang et.al. | 2505.16138 | translate | read | null |
| 2025-05-21 | Robust Multimodal Learning via Entropy-Gated Contrastive Fusion | Leon Chlon et.al. | 2505.15417 | translate | read | null |
| 2025-05-21 | EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy | Chi Kit Ng et.al. | 2505.15206 | translate | read | null |
| 2025-05-21 | Graph Foundation Models: A Comprehensive Survey | Zehong Wang et.al. | 2505.15116 | translate | read | link |
| 2025-05-19 | HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity | Xuejun Sun et.al. | 2505.14725 | translate | read | link |
| 2025-05-20 | Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning | Jiangrong Shen et.al. | 2505.14535 | translate | read | null |
| 2025-05-20 | Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition | Shuo Zhang et.al. | 2505.14143 | translate | read | null |
| 2025-05-20 | LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts | Qifeng Cai et.al. | 2505.13928 | translate | read | link |
| 2025-05-17 | Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering | Hessa Alawwad et.al. | 2505.13520 | translate | read | null |
| 2025-05-19 | AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning | Kai Zhang et.al. | 2505.12782 | translate | read | null |
| 2025-05-19 | PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI | Yingchen He et.al. | 2505.12707 | translate | read | null |
| 2025-05-17 | Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding | Can Polat et.al. | 2505.12137 | translate | read | null |
| 2025-05-17 | SafeVid: Toward Safety Aligned Video Large Multimodal Models | Yixu Wang et.al. | 2505.11926 | translate | read | null |
| 2025-05-16 | GeoMM: On Geodesic Perspective for Multi-modal Learning | Shibin Mei et.al. | 2505.11216 | translate | read | null |
| 2025-05-15 | Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence | Xiang He et.al. | 2505.10176 | translate | read | link |
| 2025-05-14 | VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation | Chaofan Zhang et.al. | 2505.09577 | translate | read | null |
| 2025-05-16 | Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora | Michael Majurski et.al. | 2505.08905 | translate | read | link |
| 2025-05-13 | Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities | Jueqing Lu et.al. | 2505.08283 | translate | read | null |
| 2025-05-11 | MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning | Lishan Yang et.al. | 2505.06911 | translate | read | null |
| 2025-05-10 | Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning | H M Dipu Kabir et.al. | 2505.06592 | translate | read | link |
| 2025-05-10 | TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition | Feng Liu et.al. | 2505.06536 | translate | read | link |
| 2025-05-09 | NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines | Chathurangi Shyalika et.al. | 2505.06333 | translate | read | link |
| 2025-05-09 | Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models | Jugal Gajjar et.al. | 2505.06110 | translate | read | null |
| 2025-05-09 | Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects | Tobias Preintner et.al. | 2505.06030 | translate | read | link |
| 2025-05-08 | The Moon’s Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction | Tom Sander et.al. | 2505.05644 | translate | read | null |
| 2025-05-07 | OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning | Xianhang Li et.al. | 2505.04601 | translate | read | link |
| 2025-05-02 | Mapping the Climate Change Landscape on TikTok | Alessia Galdeman et.al. | 2505.03813 | translate | read | null |
| 2025-05-06 | Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant | Haonan Wang et.al. | 2505.03380 | translate | read | null |
| 2025-05-06 | A Vision-Language Model for Focal Liver Lesion Classification | Song Jian et.al. | 2505.03350 | translate | read | null |
| 2025-05-06 | SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation | Yu-Ren Guo et.al. | 2505.03244 | translate | read | null |
| 2025-05-05 | The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI | Kishore Sampath et.al. | 2505.03020 | translate | read | null |
| 2025-05-02 | Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders | Rogelio A Mancisidor et.al. | 2505.01134 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)