Multimodal - 2026-03

Publish Date Title Authors PDF Translate Read Code
2026-03-31 Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning Dustin Eisenhardt et.al. 2603.29677 translate read null
2026-03-31 Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras Sherif Abdelwahab et.al. 2603.29631 translate read null
2026-03-31 EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images Yijie Zheng et.al. 2603.29441 translate read null
2026-03-30 RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time Anurag Ghosh et.al. 2603.28522 translate read null
2026-03-25 QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control Li Gao et.al. 2603.24021 translate read null
2026-03-25 Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing Xusen Guo et.al. 2603.24014 translate read null
2026-03-25 Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning Kun-Yang Yu et.al. 2603.24004 translate read null
2026-03-25 DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning Jiajian Huang et.al. 2603.23916 translate read null
2026-03-25 BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment Risa Shinoda et.al. 2603.23883 translate read null
2026-03-23 Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment Zigui Wang et.al. 2603.22530 translate read null
2026-03-23 Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion Seunghan Lee et.al. 2603.22372 translate read null
2026-03-22 Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations Tao Meng et.al. 2603.22345 translate read null
2026-03-18 Memory Bear AI Memory Science Engine for Multimodal Affective Intelligence: A Technical Report Deliang Wen et.al. 2603.22306 translate read null
2026-03-23 Revisiting Weakly-Supervised Video Scene Graph Generation via Pair Affinity Learning Minseok Kang et.al. 2603.21559 translate read null
2026-03-21 AcoustEmo: Open-Vocabulary Emotion Reasoning via Utterance-Aware Acoustic Q-Former Liyun Zhang et.al. 2603.20894 translate read null
2026-03-20 BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates Phuong-Anh Nguyen et.al. 2603.19718 translate read null
2026-03-20 Unbiased Dynamic Multimodal Fusion Shicai Wei et.al. 2603.19681 translate read null
2026-03-19 Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis Sheng Lu et.al. 2603.19516 translate read null
2026-03-19 Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation Swagat Padhan et.al. 2603.19166 translate read null
2026-03-19 NymeriaPlus: Enriching Nymeria Dataset with Additional Annotations and Data Daniel DeTone et.al. 2603.18496 translate read null
2026-03-18 Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning Jingchun Yang et.al. 2603.17930 translate read null
2026-03-18 Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning Zechang Xiong et.al. 2603.17347 translate read null
2026-03-18 On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings David Restrepo et.al. 2603.17246 translate read null
2026-03-17 Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition Yu Liu et.al. 2603.16463 translate read null
2026-03-17 Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval Weiqing Li et.al. 2603.16455 translate read null
2026-03-17 HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk Prediction Jing Dai et.al. 2603.16421 translate read null
2026-03-16 Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification Emil Hardarson et.al. 2603.15939 translate read null
2026-03-16 RealVLG-R1: A Large-Scale Real-World Visual-Language Grounding Benchmark for Robotic Perception and Manipulation Linfei Li et.al. 2603.14880 translate read null
2026-03-15 4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding Mohamed Rayan Barhdadi et.al. 2603.14301 translate read null
2026-03-15 A Real-Time Neuro-Symbolic Ethical Governor for Safe Decision Control in Autonomous Robotic Manipulation Aueaphum Aueawatthanaphisut et.al. 2603.14221 translate read null
2026-03-15 Balancing Multimodal Domain Generalization via Gradient Modulation and Projection Hongzhao Li et.al. 2603.14175 translate read null
2026-03-12 Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling Junhyeong Byeon et.al. 2603.11971 translate read null
2026-03-12 Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework Chingkwun Lam et.al. 2603.11768 translate read null
2026-03-12 IDRL: An Individual-Aware Multimodal Depression-Related Representation Learning Framework for Depression Diagnosis Chongxiao Wang et.al. 2603.11644 translate read null
2026-03-11 Learning Tree-Based Models with Gradient Descent Sascha Marton et.al. 2603.11117 translate read null
2026-03-07 AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition Yunsheng Wang et.al. 2603.10043 translate read null
2026-03-10 Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT Peng Sun et.al. 2603.09715 translate read null
2026-03-10 AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering Nguyen Anh Tuong et.al. 2603.09689 translate read null
2026-03-10 Grounding Synthetic Data Generation With Vision and Language Models Ümit Mert Çağlar et.al. 2603.09625 translate read null
2026-03-10 OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing Lixiang Lin et.al. 2603.09084 translate read null
2026-03-09 AI Agents, Language, Deep Learning and the Next Revolution in Science Ke Li et.al. 2603.07940 translate read null
2026-03-08 AeroPlace-Flow: Language-Grounded Object Placement for Aerial Manipulators via Visual Foresight and Object Flow Sarthak Mishra et.al. 2603.07744 translate read null
2026-03-07 Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards Xin Zhang et.al. 2603.06958 translate read null
2026-03-06 ProtAlign: Contrastive learning paradigm for Sequence and structure alignment Aditya Ranganath et.al. 2603.06722 translate read null
2026-03-05 Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment Xiang Ma et.al. 2603.05566 translate read null
2026-03-05 OpenFrontier: General Navigation with Visual-Language Grounded Frontiers Esteban Padilla et.al. 2603.05377 translate read null
2026-03-05 UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark Yanlin Li et.al. 2603.05075 translate read null
2026-03-05 Haptics in Cognition: Disruptor or Enabler of Memory? Bibeg Limbu et.al. 2603.05019 translate read null
2026-03-05 Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval Qujiaheng Zhang et.al. 2603.04836 translate read null
2026-03-04 CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing Juampablo E. Heras Rivera et.al. 2603.03618 translate read null
2026-03-03 DREAM: Where Visual Understanding Meets Text-to-Image Generation Chao Li et.al. 2603.02667 translate read null
2026-03-03 An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation Zheyuan Kuang et.al. 2603.02569 translate read null
2026-03-03 SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data Lekang Wen et.al. 2603.02505 translate read null
2026-03-01 Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval Xuan Lu et.al. 2603.01082 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)