Multimodal - 2025-05 | Paper Arxiv Daily

Multimodal - 2025-05

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-05-30	Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework	Can Polat et.al.	2506.00302	translate	read	null
2025-05-30	Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts	Xin He et.al.	2505.24541	translate	read	null
2025-05-29	Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition	Sean Foley et.al.	2505.24059	translate	read	null
2025-05-29	OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data	Fengxiang Wang et.al.	2505.23522	translate	read	null
2025-05-29	Bidirectional predictive coding	Gaspard Oliviers et.al.	2505.23415	translate	read	null
2025-05-29	Deep Modeling and Optimization of Medical Image Classification	Yihang Wu et.al.	2505.23040	translate	read	link
2025-05-30	EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations	Haoqin Sun et.al.	2505.23018	translate	read	link
2025-05-27	A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features	Ihab Bendidi et.al.	2505.21317	translate	read	null
2025-05-26	Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects	Chengyan Wu et.al.	2505.20511	translate	read	null
2025-05-25	PDFBench: A Benchmark for De novo Protein Design from Function	Jiahao Kuang et.al.	2505.20346	translate	read	null
2025-05-26	Learning Optimal Multimodal Information Bottleneck Representations	Qilong Wu et.al.	2505.19996	translate	read	null
2025-05-26	ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs	Pooneh Mousavi et.al.	2505.19937	translate	read	null
2025-05-26	Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning	Sanghyuk Chun et.al.	2505.19614	translate	read	null
2025-05-26	Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate	Liangwei Nathan Zheng et.al.	2505.19525	translate	read	null
2025-05-25	Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding	Shiyue Wang et.al.	2505.19219	translate	read	null
2025-05-25	I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts	Jiayi Xin et.al.	2505.19190	translate	read	link
2025-05-23	Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation	Zhihua Liu et.al.	2505.17994	translate	read	null
2025-05-23	HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	Chuhao Zhou et.al.	2505.17645	translate	read	null
2025-05-23	RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition	Yuehan Jin et.al.	2505.17501	translate	read	null
2025-05-21	NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation	Weiming Wu et.al.	2505.17121	translate	read	null
2025-05-22	ICYM2I: The illusion of multimodal informativeness under missingness	Young Sang Choi et.al.	2505.16953	translate	read	link
2025-05-22	Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports	Francesco Dalla Serra et.al.	2505.16624	translate	read	null
2025-05-22	Multimodal Online Federated Learning with Modality Missing in Internet of Things	Heqiang Wang et.al.	2505.16138	translate	read	null
2025-05-21	Robust Multimodal Learning via Entropy-Gated Contrastive Fusion	Leon Chlon et.al.	2505.15417	translate	read	null
2025-05-21	EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy	Chi Kit Ng et.al.	2505.15206	translate	read	null
2025-05-21	Graph Foundation Models: A Comprehensive Survey	Zehong Wang et.al.	2505.15116	translate	read	link
2025-05-19	HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity	Xuejun Sun et.al.	2505.14725	translate	read	link
2025-05-20	Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning	Jiangrong Shen et.al.	2505.14535	translate	read	null
2025-05-20	Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition	Shuo Zhang et.al.	2505.14143	translate	read	null
2025-05-20	LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	Qifeng Cai et.al.	2505.13928	translate	read	link
2025-05-17	Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering	Hessa Alawwad et.al.	2505.13520	translate	read	null
2025-05-19	AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning	Kai Zhang et.al.	2505.12782	translate	read	null
2025-05-19	PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI	Yingchen He et.al.	2505.12707	translate	read	null
2025-05-17	Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding	Can Polat et.al.	2505.12137	translate	read	null
2025-05-17	SafeVid: Toward Safety Aligned Video Large Multimodal Models	Yixu Wang et.al.	2505.11926	translate	read	null
2025-05-16	GeoMM: On Geodesic Perspective for Multi-modal Learning	Shibin Mei et.al.	2505.11216	translate	read	null
2025-05-15	Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence	Xiang He et.al.	2505.10176	translate	read	link
2025-05-14	VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation	Chaofan Zhang et.al.	2505.09577	translate	read	null
2025-05-16	Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora	Michael Majurski et.al.	2505.08905	translate	read	link
2025-05-13	Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities	Jueqing Lu et.al.	2505.08283	translate	read	null
2025-05-11	MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning	Lishan Yang et.al.	2505.06911	translate	read	null
2025-05-10	Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning	H M Dipu Kabir et.al.	2505.06592	translate	read	link
2025-05-10	TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition	Feng Liu et.al.	2505.06536	translate	read	link
2025-05-09	NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines	Chathurangi Shyalika et.al.	2505.06333	translate	read	link
2025-05-09	Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models	Jugal Gajjar et.al.	2505.06110	translate	read	null
2025-05-09	Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects	Tobias Preintner et.al.	2505.06030	translate	read	link
2025-05-08	The Moon’s Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction	Tom Sander et.al.	2505.05644	translate	read	null
2025-05-07	OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning	Xianhang Li et.al.	2505.04601	translate	read	link
2025-05-02	Mapping the Climate Change Landscape on TikTok	Alessia Galdeman et.al.	2505.03813	translate	read	null
2025-05-06	Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant	Haonan Wang et.al.	2505.03380	translate	read	null
2025-05-06	A Vision-Language Model for Focal Liver Lesion Classification	Song Jian et.al.	2505.03350	translate	read	null
2025-05-06	SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation	Yu-Ren Guo et.al.	2505.03244	translate	read	null
2025-05-05	The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI	Kishore Sampath et.al.	2505.03020	translate	read	null
2025-05-02	Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders	Rogelio A Mancisidor et.al.	2505.01134	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)