Multimodal - 2025-03

Publish Date Title Authors PDF Translate Read Code
2025-03-31 Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition François Olivier et.al. 2503.24110 translate read null
2025-03-31 DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description Adrienne Deganutti et.al. 2503.24096 translate read null
2025-03-31 BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation Yumeng Fu et.al. 2503.23990 translate read null
2025-03-31 Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion Jiagen Li et.al. 2503.23721 translate read null
2025-03-31 HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Kun Liu et.al. 2503.23715 translate read null
2025-03-27 Graph-to-Vision: Multi-graph Understanding and Reasoning using Vision-Language Models Ruizhou Li et.al. 2503.21435 translate read null
2025-03-27 UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning Hongxuan Tang et.al. 2503.21193 translate read null
2025-03-27 AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction Shuaiyu Zhang et.al. 2503.21124 translate read link
2025-03-26 GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations Yupei Li et.al. 2503.20919 translate read null
2025-03-26 An Encoding of Interaction Nets in OCaml Nikolaus Huber et.al. 2503.20463 translate read null
2025-03-27 RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models Mehdi Moshtaghi et.al. 2503.19654 translate read null
2025-03-25 VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction Zizhi Chen et.al. 2503.19367 translate read link
2025-03-25 LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text Weizhi Chen et.al. 2503.19311 translate read link
2025-03-24 Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition Chengxiang Huang et.al. 2503.18595 translate read link
2025-03-21 Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition Ran Liu et.al. 2503.17453 translate read link
2025-03-21 MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering Jialin Chen et.al. 2503.16858 translate read null
2025-03-20 EVA-MED: An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition Xin Huang et.al. 2503.16584 translate read null
2025-03-18 Do Multimodal Large Language Models Understand Welding? Grigorii Khvatskii et.al. 2503.16537 translate read null
2025-03-19 EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis Matthew Massey et.al. 2503.15625 translate read link
2025-03-19 Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification Zhong Ji et.al. 2503.14938 translate read null
2025-03-18 HySurvPred: Multimodal Hyperbolic Embedding with Angle-Aware Hierarchical Contrastive Learning and Uncertainty Constraints for Survival Prediction Jiaqi Yang et.al. 2503.13862 translate read null
2025-03-17 Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning Xueying Jiang et.al. 2503.12974 translate read null
2025-03-16 BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries Tianle Li et.al. 2503.12446 translate read null
2025-03-15 Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition R. Gnana Praveen et.al. 2503.12261 translate read null
2025-03-14 Cross-Modal Learning for Music-to-Music-Video Description Generation Zhuoyuan Mao et.al. 2503.11190 translate read null
2025-03-20 Unifying 2D and 3D Vision-Language Understanding Ayush Jain et.al. 2503.10745 translate read null
2025-03-11 TLA: Tactile-Language-Action Model for Contact-Rich Manipulation Peng Hao et.al. 2503.08548 translate read null
2025-03-10 Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency Duy Phuong Nguyen et.al. 2503.07552 translate read link
2025-03-10 A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis Xiang Liu et.al. 2503.06973 translate read link
2025-03-10 HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation Xingzu Zhan et.al. 2503.06897 translate read null
2025-03-10 Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting Cagri Gungor et.al. 2503.06860 translate read null
2025-03-09 Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts Aref Farhadipour et.al. 2503.06805 translate read null
2025-03-13 DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning Chengxuan Qian et.al. 2503.06456 translate read link
2025-03-05 Beyond H&E: Unlocking Pathological Insights with Polarization via Self-supervised Learning Yao Du et.al. 2503.05933 translate read null
2025-03-10 R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning Jiaxing Zhao et.al. 2503.05379 translate read null
2025-03-07 Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation Xinkun Wang et.al. 2503.05319 translate read null
2025-03-06 Large Language Models in Bioinformatics: A Survey Zhenyu Wang et.al. 2503.04490 translate read null
2025-03-05 Rebalanced Multimodal Learning with Data-aware Unimodal Sampling Qingyuan Jiang et.al. 2503.03792 translate read null
2025-03-04 Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data Amin Honarmandi Shandiz et.al. 2503.02849 translate read null
2025-03-04 Multimodal AI predicts clinical outcomes of drug combinations from preclinical data Yepeng Huang et.al. 2503.02781 translate read null
2025-03-03 Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA Zhusi Zhong et.al. 2503.02034 translate read null
2025-03-03 DeepSuM: Deep Sufficient Modality Learning Framework Zhe Gao et.al. 2503.01728 translate read null
2025-03-03 Dementia Insights: A Context-Based MultiModal Approach Sahar Sinene Mehdoui et.al. 2503.01226 translate read null
2025-03-03 HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation Hongye Cheng et.al. 2503.01175 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)