Multimodal - 2025-03 | Paper Arxiv Daily

Multimodal - 2025-03

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-03-31	Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition	François Olivier et.al.	2503.24110	translate	read	null
2025-03-31	DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description	Adrienne Deganutti et.al.	2503.24096	translate	read	null
2025-03-31	BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation	Yumeng Fu et.al.	2503.23990	translate	read	null
2025-03-31	Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion	Jiagen Li et.al.	2503.23721	translate	read	null
2025-03-31	HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation	Kun Liu et.al.	2503.23715	translate	read	null
2025-03-27	Graph-to-Vision: Multi-graph Understanding and Reasoning using Vision-Language Models	Ruizhou Li et.al.	2503.21435	translate	read	null
2025-03-27	UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning	Hongxuan Tang et.al.	2503.21193	translate	read	null
2025-03-27	AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction	Shuaiyu Zhang et.al.	2503.21124	translate	read	link
2025-03-26	GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations	Yupei Li et.al.	2503.20919	translate	read	null
2025-03-26	An Encoding of Interaction Nets in OCaml	Nikolaus Huber et.al.	2503.20463	translate	read	null
2025-03-27	RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models	Mehdi Moshtaghi et.al.	2503.19654	translate	read	null
2025-03-25	VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction	Zizhi Chen et.al.	2503.19367	translate	read	link
2025-03-25	LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text	Weizhi Chen et.al.	2503.19311	translate	read	link
2025-03-24	Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition	Chengxiang Huang et.al.	2503.18595	translate	read	link
2025-03-21	Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition	Ran Liu et.al.	2503.17453	translate	read	link
2025-03-21	MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering	Jialin Chen et.al.	2503.16858	translate	read	null
2025-03-20	EVA-MED: An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition	Xin Huang et.al.	2503.16584	translate	read	null
2025-03-18	Do Multimodal Large Language Models Understand Welding?	Grigorii Khvatskii et.al.	2503.16537	translate	read	null
2025-03-19	EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis	Matthew Massey et.al.	2503.15625	translate	read	link
2025-03-19	Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification	Zhong Ji et.al.	2503.14938	translate	read	null
2025-03-18	HySurvPred: Multimodal Hyperbolic Embedding with Angle-Aware Hierarchical Contrastive Learning and Uncertainty Constraints for Survival Prediction	Jiaqi Yang et.al.	2503.13862	translate	read	null
2025-03-17	Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning	Xueying Jiang et.al.	2503.12974	translate	read	null
2025-03-16	BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries	Tianle Li et.al.	2503.12446	translate	read	null
2025-03-15	Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition	R. Gnana Praveen et.al.	2503.12261	translate	read	null
2025-03-14	Cross-Modal Learning for Music-to-Music-Video Description Generation	Zhuoyuan Mao et.al.	2503.11190	translate	read	null
2025-03-20	Unifying 2D and 3D Vision-Language Understanding	Ayush Jain et.al.	2503.10745	translate	read	null
2025-03-11	TLA: Tactile-Language-Action Model for Contact-Rich Manipulation	Peng Hao et.al.	2503.08548	translate	read	null
2025-03-10	Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency	Duy Phuong Nguyen et.al.	2503.07552	translate	read	link
2025-03-10	A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis	Xiang Liu et.al.	2503.06973	translate	read	link
2025-03-10	HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation	Xingzu Zhan et.al.	2503.06897	translate	read	null
2025-03-10	Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting	Cagri Gungor et.al.	2503.06860	translate	read	null
2025-03-09	Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts	Aref Farhadipour et.al.	2503.06805	translate	read	null
2025-03-13	DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning	Chengxuan Qian et.al.	2503.06456	translate	read	link
2025-03-05	Beyond H&E: Unlocking Pathological Insights with Polarization via Self-supervised Learning	Yao Du et.al.	2503.05933	translate	read	null
2025-03-10	R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning	Jiaxing Zhao et.al.	2503.05379	translate	read	null
2025-03-07	Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation	Xinkun Wang et.al.	2503.05319	translate	read	null
2025-03-06	Large Language Models in Bioinformatics: A Survey	Zhenyu Wang et.al.	2503.04490	translate	read	null
2025-03-05	Rebalanced Multimodal Learning with Data-aware Unimodal Sampling	Qingyuan Jiang et.al.	2503.03792	translate	read	null
2025-03-04	Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data	Amin Honarmandi Shandiz et.al.	2503.02849	translate	read	null
2025-03-04	Multimodal AI predicts clinical outcomes of drug combinations from preclinical data	Yepeng Huang et.al.	2503.02781	translate	read	null
2025-03-03	Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA	Zhusi Zhong et.al.	2503.02034	translate	read	null
2025-03-03	DeepSuM: Deep Sufficient Modality Learning Framework	Zhe Gao et.al.	2503.01728	translate	read	null
2025-03-03	Dementia Insights: A Context-Based MultiModal Approach	Sahar Sinene Mehdoui et.al.	2503.01226	translate	read	null
2025-03-03	HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation	Hongye Cheng et.al.	2503.01175	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)