Multimodal - 2025-03
Multimodal - 2025-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-03-31 | Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition | François Olivier et.al. | 2503.24110 | translate | read | null |
| 2025-03-31 | DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description | Adrienne Deganutti et.al. | 2503.24096 | translate | read | null |
| 2025-03-31 | BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation | Yumeng Fu et.al. | 2503.23990 | translate | read | null |
| 2025-03-31 | Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion | Jiagen Li et.al. | 2503.23721 | translate | read | null |
| 2025-03-31 | HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Kun Liu et.al. | 2503.23715 | translate | read | null |
| 2025-03-27 | Graph-to-Vision: Multi-graph Understanding and Reasoning using Vision-Language Models | Ruizhou Li et.al. | 2503.21435 | translate | read | null |
| 2025-03-27 | UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning | Hongxuan Tang et.al. | 2503.21193 | translate | read | null |
| 2025-03-27 | AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction | Shuaiyu Zhang et.al. | 2503.21124 | translate | read | link |
| 2025-03-26 | GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations | Yupei Li et.al. | 2503.20919 | translate | read | null |
| 2025-03-26 | An Encoding of Interaction Nets in OCaml | Nikolaus Huber et.al. | 2503.20463 | translate | read | null |
| 2025-03-27 | RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models | Mehdi Moshtaghi et.al. | 2503.19654 | translate | read | null |
| 2025-03-25 | VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction | Zizhi Chen et.al. | 2503.19367 | translate | read | link |
| 2025-03-25 | LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text | Weizhi Chen et.al. | 2503.19311 | translate | read | link |
| 2025-03-24 | Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition | Chengxiang Huang et.al. | 2503.18595 | translate | read | link |
| 2025-03-21 | Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition | Ran Liu et.al. | 2503.17453 | translate | read | link |
| 2025-03-21 | MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering | Jialin Chen et.al. | 2503.16858 | translate | read | null |
| 2025-03-20 | EVA-MED: An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition | Xin Huang et.al. | 2503.16584 | translate | read | null |
| 2025-03-18 | Do Multimodal Large Language Models Understand Welding? | Grigorii Khvatskii et.al. | 2503.16537 | translate | read | null |
| 2025-03-19 | EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis | Matthew Massey et.al. | 2503.15625 | translate | read | link |
| 2025-03-19 | Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification | Zhong Ji et.al. | 2503.14938 | translate | read | null |
| 2025-03-18 | HySurvPred: Multimodal Hyperbolic Embedding with Angle-Aware Hierarchical Contrastive Learning and Uncertainty Constraints for Survival Prediction | Jiaqi Yang et.al. | 2503.13862 | translate | read | null |
| 2025-03-17 | Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning | Xueying Jiang et.al. | 2503.12974 | translate | read | null |
| 2025-03-16 | BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries | Tianle Li et.al. | 2503.12446 | translate | read | null |
| 2025-03-15 | Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition | R. Gnana Praveen et.al. | 2503.12261 | translate | read | null |
| 2025-03-14 | Cross-Modal Learning for Music-to-Music-Video Description Generation | Zhuoyuan Mao et.al. | 2503.11190 | translate | read | null |
| 2025-03-20 | Unifying 2D and 3D Vision-Language Understanding | Ayush Jain et.al. | 2503.10745 | translate | read | null |
| 2025-03-11 | TLA: Tactile-Language-Action Model for Contact-Rich Manipulation | Peng Hao et.al. | 2503.08548 | translate | read | null |
| 2025-03-10 | Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency | Duy Phuong Nguyen et.al. | 2503.07552 | translate | read | link |
| 2025-03-10 | A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis | Xiang Liu et.al. | 2503.06973 | translate | read | link |
| 2025-03-10 | HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation | Xingzu Zhan et.al. | 2503.06897 | translate | read | null |
| 2025-03-10 | Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting | Cagri Gungor et.al. | 2503.06860 | translate | read | null |
| 2025-03-09 | Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts | Aref Farhadipour et.al. | 2503.06805 | translate | read | null |
| 2025-03-13 | DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning | Chengxuan Qian et.al. | 2503.06456 | translate | read | link |
| 2025-03-05 | Beyond H&E: Unlocking Pathological Insights with Polarization via Self-supervised Learning | Yao Du et.al. | 2503.05933 | translate | read | null |
| 2025-03-10 | R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning | Jiaxing Zhao et.al. | 2503.05379 | translate | read | null |
| 2025-03-07 | Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation | Xinkun Wang et.al. | 2503.05319 | translate | read | null |
| 2025-03-06 | Large Language Models in Bioinformatics: A Survey | Zhenyu Wang et.al. | 2503.04490 | translate | read | null |
| 2025-03-05 | Rebalanced Multimodal Learning with Data-aware Unimodal Sampling | Qingyuan Jiang et.al. | 2503.03792 | translate | read | null |
| 2025-03-04 | Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data | Amin Honarmandi Shandiz et.al. | 2503.02849 | translate | read | null |
| 2025-03-04 | Multimodal AI predicts clinical outcomes of drug combinations from preclinical data | Yepeng Huang et.al. | 2503.02781 | translate | read | null |
| 2025-03-03 | Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA | Zhusi Zhong et.al. | 2503.02034 | translate | read | null |
| 2025-03-03 | DeepSuM: Deep Sufficient Modality Learning Framework | Zhe Gao et.al. | 2503.01728 | translate | read | null |
| 2025-03-03 | Dementia Insights: A Context-Based MultiModal Approach | Sahar Sinene Mehdoui et.al. | 2503.01226 | translate | read | null |
| 2025-03-03 | HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation | Hongye Cheng et.al. | 2503.01175 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)