Multimodal - 2026-03
Multimodal - 2026-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2026-03-31 | Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning | Dustin Eisenhardt et.al. | 2603.29677 | translate | read | null |
| 2026-03-31 | Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras | Sherif Abdelwahab et.al. | 2603.29631 | translate | read | null |
| 2026-03-31 | EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images | Yijie Zheng et.al. | 2603.29441 | translate | read | null |
| 2026-03-30 | RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time | Anurag Ghosh et.al. | 2603.28522 | translate | read | null |
| 2026-03-25 | QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control | Li Gao et.al. | 2603.24021 | translate | read | null |
| 2026-03-25 | Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing | Xusen Guo et.al. | 2603.24014 | translate | read | null |
| 2026-03-25 | Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning | Kun-Yang Yu et.al. | 2603.24004 | translate | read | null |
| 2026-03-25 | DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning | Jiajian Huang et.al. | 2603.23916 | translate | read | null |
| 2026-03-25 | BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment | Risa Shinoda et.al. | 2603.23883 | translate | read | null |
| 2026-03-23 | Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment | Zigui Wang et.al. | 2603.22530 | translate | read | null |
| 2026-03-23 | Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion | Seunghan Lee et.al. | 2603.22372 | translate | read | null |
| 2026-03-22 | Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations | Tao Meng et.al. | 2603.22345 | translate | read | null |
| 2026-03-18 | Memory Bear AI Memory Science Engine for Multimodal Affective Intelligence: A Technical Report | Deliang Wen et.al. | 2603.22306 | translate | read | null |
| 2026-03-23 | Revisiting Weakly-Supervised Video Scene Graph Generation via Pair Affinity Learning | Minseok Kang et.al. | 2603.21559 | translate | read | null |
| 2026-03-21 | AcoustEmo: Open-Vocabulary Emotion Reasoning via Utterance-Aware Acoustic Q-Former | Liyun Zhang et.al. | 2603.20894 | translate | read | null |
| 2026-03-20 | BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates | Phuong-Anh Nguyen et.al. | 2603.19718 | translate | read | null |
| 2026-03-20 | Unbiased Dynamic Multimodal Fusion | Shicai Wei et.al. | 2603.19681 | translate | read | null |
| 2026-03-19 | Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis | Sheng Lu et.al. | 2603.19516 | translate | read | null |
| 2026-03-19 | Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation | Swagat Padhan et.al. | 2603.19166 | translate | read | null |
| 2026-03-19 | NymeriaPlus: Enriching Nymeria Dataset with Additional Annotations and Data | Daniel DeTone et.al. | 2603.18496 | translate | read | null |
| 2026-03-18 | Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning | Jingchun Yang et.al. | 2603.17930 | translate | read | null |
| 2026-03-18 | Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning | Zechang Xiong et.al. | 2603.17347 | translate | read | null |
| 2026-03-18 | On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings | David Restrepo et.al. | 2603.17246 | translate | read | null |
| 2026-03-17 | Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition | Yu Liu et.al. | 2603.16463 | translate | read | null |
| 2026-03-17 | Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval | Weiqing Li et.al. | 2603.16455 | translate | read | null |
| 2026-03-17 | HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk Prediction | Jing Dai et.al. | 2603.16421 | translate | read | null |
| 2026-03-16 | Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification | Emil Hardarson et.al. | 2603.15939 | translate | read | null |
| 2026-03-16 | RealVLG-R1: A Large-Scale Real-World Visual-Language Grounding Benchmark for Robotic Perception and Manipulation | Linfei Li et.al. | 2603.14880 | translate | read | null |
| 2026-03-15 | 4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding | Mohamed Rayan Barhdadi et.al. | 2603.14301 | translate | read | null |
| 2026-03-15 | A Real-Time Neuro-Symbolic Ethical Governor for Safe Decision Control in Autonomous Robotic Manipulation | Aueaphum Aueawatthanaphisut et.al. | 2603.14221 | translate | read | null |
| 2026-03-15 | Balancing Multimodal Domain Generalization via Gradient Modulation and Projection | Hongzhao Li et.al. | 2603.14175 | translate | read | null |
| 2026-03-12 | Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling | Junhyeong Byeon et.al. | 2603.11971 | translate | read | null |
| 2026-03-12 | Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework | Chingkwun Lam et.al. | 2603.11768 | translate | read | null |
| 2026-03-12 | IDRL: An Individual-Aware Multimodal Depression-Related Representation Learning Framework for Depression Diagnosis | Chongxiao Wang et.al. | 2603.11644 | translate | read | null |
| 2026-03-11 | Learning Tree-Based Models with Gradient Descent | Sascha Marton et.al. | 2603.11117 | translate | read | null |
| 2026-03-07 | AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition | Yunsheng Wang et.al. | 2603.10043 | translate | read | null |
| 2026-03-10 | Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT | Peng Sun et.al. | 2603.09715 | translate | read | null |
| 2026-03-10 | AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering | Nguyen Anh Tuong et.al. | 2603.09689 | translate | read | null |
| 2026-03-10 | Grounding Synthetic Data Generation With Vision and Language Models | Ümit Mert Çağlar et.al. | 2603.09625 | translate | read | null |
| 2026-03-10 | OmniEdit: A Training-free framework for Lip Synchronization and Audio-Visual Editing | Lixiang Lin et.al. | 2603.09084 | translate | read | null |
| 2026-03-09 | AI Agents, Language, Deep Learning and the Next Revolution in Science | Ke Li et.al. | 2603.07940 | translate | read | null |
| 2026-03-08 | AeroPlace-Flow: Language-Grounded Object Placement for Aerial Manipulators via Visual Foresight and Object Flow | Sarthak Mishra et.al. | 2603.07744 | translate | read | null |
| 2026-03-07 | Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards | Xin Zhang et.al. | 2603.06958 | translate | read | null |
| 2026-03-06 | ProtAlign: Contrastive learning paradigm for Sequence and structure alignment | Aditya Ranganath et.al. | 2603.06722 | translate | read | null |
| 2026-03-05 | Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment | Xiang Ma et.al. | 2603.05566 | translate | read | null |
| 2026-03-05 | OpenFrontier: General Navigation with Visual-Language Grounded Frontiers | Esteban Padilla et.al. | 2603.05377 | translate | read | null |
| 2026-03-05 | UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark | Yanlin Li et.al. | 2603.05075 | translate | read | null |
| 2026-03-05 | Haptics in Cognition: Disruptor or Enabler of Memory? | Bibeg Limbu et.al. | 2603.05019 | translate | read | null |
| 2026-03-05 | Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval | Qujiaheng Zhang et.al. | 2603.04836 | translate | read | null |
| 2026-03-04 | CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing | Juampablo E. Heras Rivera et.al. | 2603.03618 | translate | read | null |
| 2026-03-03 | DREAM: Where Visual Understanding Meets Text-to-Image Generation | Chao Li et.al. | 2603.02667 | translate | read | null |
| 2026-03-03 | An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation | Zheyuan Kuang et.al. | 2603.02569 | translate | read | null |
| 2026-03-03 | SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data | Lekang Wen et.al. | 2603.02505 | translate | read | null |
| 2026-03-01 | Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval | Xuan Lu et.al. | 2603.01082 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)