Multimodal - 2025-11

Publish Date Title Authors PDF Translate Read Code
2025-11-30 MM-ACT: Learn from Multimodal Parallel Generation to Act Haotian Liang et.al. 2512.00975 translate read null
2025-11-29 Describe Anything Anywhere At Any Moment Nicolas Gorlo et.al. 2512.00565 translate read null
2025-11-29 CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA Vsevolod Kovalev et.al. 2512.00360 translate read null
2025-11-28 Buffer replay enhances the robustness of multimodal learning under missing-modality Hongye Zhu et.al. 2511.23070 translate read null
2025-11-27 Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation Xinyi Che et.al. 2511.22463 translate read null
2025-11-27 Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation Xinyi Che et.al. 2511.22447 translate read null
2025-11-27 Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples Shuhei Yamashita et.al. 2511.22141 translate read null
2025-11-26 WalkCLIP: Multimodal Learning for Urban Walkability Prediction Shilong Xiang et.al. 2511.21947 translate read null
2025-11-26 Evaluating Strategies for Synthesizing Clinical Notes for Medical Multimodal AI Niccolo Marini et.al. 2511.21827 translate read null
2025-11-26 Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling Mengran Li et.al. 2511.21120 translate read null
2025-11-25 A review on data fusion in multimodal learning analytics and educational data mining Wilson Chango et.al. 2511.20871 translate read null
2025-11-25 VibraVerse: A Large-Scale Geometry-Acoustics Alignment Dataset for Physically-Consistent Multimodal Learning Bo Pang et.al. 2511.20422 translate read null
2025-11-25 MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts Zilong Huang et.al. 2511.20415 translate read null
2025-11-25 ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis Advik Sinha et.al. 2511.20274 translate read null
2025-11-24 Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation Yingjia Shang et.al. 2511.19257 translate read null
2025-11-24 IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes Carl Lindström et.al. 2511.19235 translate read null
2025-11-24 Can Modern Vision Models Understand the Difference Between an Object and a Look-alike? Itay Cohen et.al. 2511.19200 translate read null
2025-11-23 Breaking Forgetting: Training-Free Few-Shot Class-Incremental Learning via Conditional Diffusion Haidong Kang et.al. 2511.18516 translate read null
2025-11-22 Vulnerability-Aware Robust Multimodal Adversarial Training Junrui Zhang et.al. 2511.18138 translate read null
2025-11-22 Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning Xiaohong Liu et.al. 2511.18104 translate read null
2025-11-17 Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding Yassir Benhammou et.al. 2511.17596 translate read null
2025-11-21 MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment Huangbiao Xu et.al. 2511.17397 translate read null
2025-11-21 UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation Chi Zhang et.al. 2511.16917 translate read null
2025-11-20 LLaVA $^3$ : Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs Doriand Petit et.al. 2511.16454 translate read null
2025-11-20 Boosting Medical Visual Understanding From Multi-Granular Language Learning Zihan Li et.al. 2511.15943 translate read null
2025-11-18 Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer Hyo-Jeong Jang et.al. 2511.15741 translate read null
2025-11-19 SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome Dabin Jeong et.al. 2511.15464 translate read null
2025-11-19 Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions Shan Shan et.al. 2511.15342 translate read null
2025-11-19 Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval Qing Wang et.al. 2511.15201 translate read null
2025-11-19 TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition Wen Yin et.al. 2511.15085 translate read null
2025-11-18 Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion Zanxu Wang et.al. 2511.14969 translate read null
2025-11-18 Toward Robust and Harmonious Adaptation for Cross-modal Retrieval Haobin Li et.al. 2511.14416 translate read null
2025-11-18 Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation Weimin Bai et.al. 2511.14271 translate read null
2025-11-18 Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision Zitang Sun et.al. 2511.14197 translate read null
2025-11-14 Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement Zhe Yang et.al. 2511.13755 translate read null
2025-11-17 3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale Yijia Fan et.al. 2511.13211 translate read null
2025-11-17 uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data Dahyun Chung et.al. 2511.13036 translate read null
2025-11-17 Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks Minsoo Jo et.al. 2511.12985 translate read null
2025-11-15 To Align or Not to Align: Strategic Multimodal Representation Alignment for Optimal Performance Wanlong Fang et.al. 2511.12121 translate read null
2025-11-14 Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification Qinghao Gao et.al. 2511.11460 translate read null
2025-11-14 AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery Yuqi Yin et.al. 2511.11257 translate read null
2025-11-14 LEMUR: Large scale End-to-end MUltimodal Recommendation Xintian Han et.al. 2511.10962 translate read null
2025-11-14 MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition Feng Li et.al. 2511.10892 translate read null
2025-11-13 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals Shruti Singh Baghel et.al. 2511.10615 translate read null
2025-11-13 URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding Yongxin Shi et.al. 2511.10552 translate read null
2025-11-13 GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval Hao Zou et.al. 2511.10154 translate read null
2025-11-13 Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction Mingda Jia et.al. 2511.10134 translate read null
2025-11-13 Towards Robust Multimodal Learning in the Open World Fushuo Huo et.al. 2511.09989 translate read null
2025-11-12 Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard Stelios Zarifis et.al. 2511.09727 translate read null
2025-11-12 End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering Jiliang Hu et.al. 2511.09282 translate read null
2025-11-11 Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding Da Li et.al. 2511.08480 translate read null
2025-11-11 Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation Jun Sun et.al. 2511.08152 translate read null
2025-11-11 Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval Likang Peng et.al. 2511.07780 translate read null
2025-11-11 Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling Jiale Liu et.al. 2511.07710 translate read null
2025-11-10 A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation Kamand Kalashi et.al. 2511.07573 translate read null
2025-11-10 Integrating Epigenetic and Phenotypic Features for Biological Age Estimation in Cancer Patients via Multimodal Learning Shuyue Jiang et.al. 2511.07219 translate read null
2025-11-10 Med-SORA: Symptom to Organ Reasoning in Abdomen CT Images You-Kyoung Na et.al. 2511.06752 translate read null
2025-11-09 LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval Jian Zhang et.al. 2511.06268 translate read null
2025-11-09 VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving Ruifei Zhang et.al. 2511.06256 translate read null
2025-11-09 AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving Ruifei Zhang et.al. 2511.06253 translate read null
2025-11-08 Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models Akshar Tumu et.al. 2511.06146 translate read null
2025-11-04 Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction An Vuong et.al. 2511.05577 translate read null
2025-11-06 DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification Yujie Yang et.al. 2511.04281 translate read null
2025-11-05 Cross-Modal Alignment via Variational Copula Modelling Feng Wu et.al. 2511.03196 translate read null
2025-11-04 SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment Wenbo Lu et.al. 2511.03019 translate read null
2025-11-04 ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology Srikumar Sastry et.al. 2511.02946 translate read null
2025-11-04 When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning Chenyu Zhang et.al. 2511.02794 translate read null
2025-11-03 OmniFuser: Adaptive Multimodal Fusion for Service-Oriented Predictive Maintenance Ziqi Wang et.al. 2511.01320 translate read null
2025-11-02 Balanced Multimodal Learning via Mutual Information Rongrong Xie et.al. 2511.00987 translate read null
2025-11-01 LIR: The First Workshop on Late Interaction and Multi Vector Retrieval @ ECIR 2026 Benjamin Clavié et.al. 2511.00444 translate read null
2025-11-01 Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities Xihang Qiu et.al. 2511.00344 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)