Multimodal - 2026-03 | Paper Arxiv Daily

Multimodal - 2026-03

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-03-31	Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning	Dustin Eisenhardt et.al.	2603.29677	translate	read	null
2026-03-31	Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras	Sherif Abdelwahab et.al.	2603.29631	translate	read	null
2026-03-31	EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images	Yijie Zheng et.al.	2603.29441	translate	read	null
2026-03-30	RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time	Anurag Ghosh et.al.	2603.28522	translate	read	null
2026-03-25	QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control	Li Gao et.al.	2603.24021	translate	read	null
2026-03-25	Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing	Xusen Guo et.al.	2603.24014	translate	read	null
2026-03-25	Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning	Kun-Yang Yu et.al.	2603.24004	translate	read	null
2026-03-25	DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning	Jiajian Huang et.al.	2603.23916	translate	read	null
2026-03-25	BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment	Risa Shinoda et.al.	2603.23883	translate	read	null
2026-03-23	Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment	Zigui Wang et.al.	2603.22530	translate	read	null
2026-03-23	Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion	Seunghan Lee et.al.	2603.22372	translate	read	null
2026-03-22	Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations	Tao Meng et.al.	2603.22345	translate	read	null
2026-03-18	Memory Bear AI Memory Science Engine for Multimodal Affective Intelligence: A Technical Report	Deliang Wen et.al.	2603.22306	translate	read	null
2026-03-23	Revisiting Weakly-Supervised Video Scene Graph Generation via Pair Affinity Learning	Minseok Kang et.al.	2603.21559	translate	read	null
2026-03-21	AcoustEmo: Open-Vocabulary Emotion Reasoning via Utterance-Aware Acoustic Q-Former	Liyun Zhang et.al.	2603.20894	translate	read	null
2026-03-20	BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates	Phuong-Anh Nguyen et.al.	2603.19718	translate	read	null
2026-03-20	Unbiased Dynamic Multimodal Fusion	Shicai Wei et.al.	2603.19681	translate	read	null
2026-03-19	Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis	Sheng Lu et.al.	2603.19516	translate	read	null
2026-03-19	Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation	Swagat Padhan et.al.	2603.19166	translate	read	null
2026-03-19	NymeriaPlus: Enriching Nymeria Dataset with Additional Annotations and Data	Daniel DeTone et.al.	2603.18496	translate	read	null
2026-03-18	Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning	Jingchun Yang et.al.	2603.17930	translate	read	null
2026-03-18	Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning	Zechang Xiong et.al.	2603.17347	translate	read	null
2026-03-18	On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings	David Restrepo et.al.	2603.17246	translate	read	null
2026-03-17	Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition	Yu Liu et.al.	2603.16463	translate	read	null
2026-03-17	Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval	Weiqing Li et.al.	2603.16455	translate	read	null
2026-03-17	HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk Prediction	Jing Dai et.al.	2603.16421	translate	read	null
2026-03-16	Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification	Emil Hardarson et.al.	2603.15939	translate	read	null
2026-03-16	RealVLG-R1: A Large-Scale Real-World Visual-Language Grounding Benchmark for Robotic Perception and Manipulation	Linfei Li et.al.	2603.14880	translate	read	null
2026-03-15	4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding	Mohamed Rayan Barhdadi et.al.	2603.14301	translate	read	null
2026-03-15	A Real-Time Neuro-Symbolic Ethical Governor for Safe Decision Control in Autonomous Robotic Manipulation	Aueaphum Aueawatthanaphisut et.al.	2603.14221	translate	read	null
2026-03-15	Balancing Multimodal Domain Generalization via Gradient Modulation and Projection	Hongzhao Li et.al.	2603.14175	translate	read	null
2026-03-12	Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling	Junhyeong Byeon et.al.	2603.11971	translate	read	null
2026-03-12	Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework	Chingkwun Lam et.al.	2603.11768	translate	read	null
2026-03-12	IDRL: An Individual-Aware Multimodal Depression-Related Representation Learning Framework for Depression Diagnosis	Chongxiao Wang et.al.	2603.11644	translate	read	null
2026-03-11	Learning Tree-Based Models with Gradient Descent	Sascha Marton et.al.	2603.11117	translate	read	null
2026-03-07	AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition	Yunsheng Wang et.al.	2603.10043	translate	read	null
2026-03-10	Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT	Peng Sun et.al.	2603.09715	translate	read	null
2026-03-10	AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering	Nguyen Anh Tuong et.al.	2603.09689	translate	read	null
2026-03-10	Grounding Synthetic Data Generation With Vision and Language Models	Ümit Mert Çağlar et.al.	2603.09625	translate	read	null
2026-03-10	OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing	Lixiang Lin et.al.	2603.09084	translate	read	null
2026-03-09	AI Agents, Language, Deep Learning and the Next Revolution in Science	Ke Li et.al.	2603.07940	translate	read	null
2026-03-08	AeroPlace-Flow: Language-Grounded Object Placement for Aerial Manipulators via Visual Foresight and Object Flow	Sarthak Mishra et.al.	2603.07744	translate	read	null
2026-03-07	Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards	Xin Zhang et.al.	2603.06958	translate	read	null
2026-03-06	ProtAlign: Contrastive learning paradigm for Sequence and structure alignment	Aditya Ranganath et.al.	2603.06722	translate	read	null
2026-03-05	Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment	Xiang Ma et.al.	2603.05566	translate	read	null
2026-03-05	OpenFrontier: General Navigation with Visual-Language Grounded Frontiers	Esteban Padilla et.al.	2603.05377	translate	read	null
2026-03-05	UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark	Yanlin Li et.al.	2603.05075	translate	read	null
2026-03-05	Haptics in Cognition: Disruptor or Enabler of Memory?	Bibeg Limbu et.al.	2603.05019	translate	read	null
2026-03-05	Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval	Qujiaheng Zhang et.al.	2603.04836	translate	read	null
2026-03-04	CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing	Juampablo E. Heras Rivera et.al.	2603.03618	translate	read	null
2026-03-03	DREAM: Where Visual Understanding Meets Text-to-Image Generation	Chao Li et.al.	2603.02667	translate	read	null
2026-03-03	An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation	Zheyuan Kuang et.al.	2603.02569	translate	read	null
2026-03-03	SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data	Lekang Wen et.al.	2603.02505	translate	read	null
2026-03-01	Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval	Xuan Lu et.al.	2603.01082	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)