Multimodal - 2025-08

Publish Date Title Authors PDF Translate Read Code
2025-08-29 Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer Daniël Boeke et.al. 2508.21581 translate read null
2025-08-27 Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement Mohammed Rakibul Hasan et.al. 2508.19887 translate read null
2025-08-27 AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning Shu Shen et.al. 2508.19769 translate read null
2025-08-25 BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration Jun Hou et.al. 2508.18551 translate read null
2025-08-22 Can VLMs Recall Factual Associations From Visual References? Dhananjay Ashok et.al. 2508.18297 translate read null
2025-08-20 Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders Yiming Tang et.al. 2508.18236 translate read null
2025-08-24 Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice Hugo Bohy et.al. 2508.17502 translate read link
2025-08-24 Multimodal Representation Learning Conditioned on Semantic Relations Yang Qiao et.al. 2508.17497 translate read null
2025-08-24 SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality Yuzhi Lai et.al. 2508.17255 translate read null
2025-08-10 An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance Hsuan-Kung Yang et.al. 2508.16602 translate read null
2025-08-22 Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization Yupei Zhang et.al. 2508.16479 translate read null
2025-08-22 A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension Mohammad Zia Ur Rehman et.al. 2508.16300 translate read null
2025-08-21 Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation Huy Hoang Nguyen et.al. 2508.15427 translate read null
2025-08-21 DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding Zhu Wang et.al. 2508.15297 translate read null
2025-08-20 MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs Ruyi Ding et.al. 2508.15036 translate read null
2025-08-19 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications Omkar Thawakar et.al. 2508.14039 translate read link
2025-08-19 CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter Junyeong Park et.al. 2508.13530 translate read null
2025-08-19 CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models Catherine Glossop et.al. 2508.13446 translate read null
2025-08-18 SPANER: Shared Prompt Aligner for Multimodal Semantic Representation Thye Shan Ng et.al. 2508.13387 translate read null
2025-08-18 Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation Tanjim Islam Riju et.al. 2508.13068 translate read null
2025-08-17 Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping Xuhui Zhan et.al. 2508.12466 translate read link
2025-08-16 MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization Haochen You et.al. 2508.12149 translate read null
2025-08-16 ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models Zhichen Lou et.al. 2508.11918 translate read null
2025-08-13 MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning Thanh-Dat Truong et.al. 2508.10133 translate read null
2025-08-13 Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model Sushrut Patwardhan et.al. 2508.10110 translate read null
2025-08-12 LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition Zhining He et.al. 2508.08925 translate read null
2025-08-12 Multimodal learning enables instant ionizing radiation alerts on unmodified mobile phones for real-world emergency response Yanfeng Xie et.al. 2508.08541 translate read null
2025-08-11 BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models Maozhen Zhang et.al. 2508.08040 translate read null
2025-08-11 A Trustworthy Method for Multimodal Emotion Recognition Junxiao Xue et.al. 2508.07625 translate read null
2025-08-10 Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding Zhaoyu Chen et.al. 2508.07388 translate read null
2025-08-10 FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning Van Duc Cuong et.al. 2508.07264 translate read null
2025-08-09 Can Multitask Learning Enhance Model Explainability? Hiba Najjar et.al. 2508.06966 translate read null
2025-08-09 Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction Hiba Najjar et.al. 2508.06939 translate read null
2025-08-09 Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities Rui Liu et.al. 2508.06800 translate read null
2025-08-08 Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records Mosbah Aouad et.al. 2508.06627 translate read null
2025-08-07 Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features Manish Kansana et.al. 2508.06566 translate read null
2025-08-06 Grounding Emotion Recognition with Visual Prototypes: VEGA – Revisiting CLIP in MERC Guanyu Hu et.al. 2508.06564 translate read null
2025-08-08 Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning Xiangyu Wu et.al. 2508.06382 translate read null
2025-08-08 ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge Juewen Hu et.al. 2508.05991 translate read null
2025-08-07 Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning Luai Abuelsamen et.al. 2508.05077 translate read null
2025-08-07 MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding Weifan Zhang et.al. 2508.05021 translate read null
2025-08-06 Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models Md Raisul Kibria et.al. 2508.04427 translate read null
2025-08-06 Length Matters: Length-Aware Transformer for Temporal Sentence Grounding Yifan Wang et.al. 2508.04299 translate read null
2025-08-06 SVC 2025: the First Multimodal Deception Detection Challenge Xun Lin et.al. 2508.04129 translate read null
2025-08-05 T2UE: Generating Unlearnable Examples from Text Descriptions Xingjun Ma et.al. 2508.03091 translate read null
2025-08-04 MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming Shuo Wang et.al. 2508.02549 translate read null
2025-08-04 Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs Yitong Zhu et.al. 2508.02133 translate read null
2025-08-04 “Harmless to You, Hurtful to Me!”: Investigating the Detection of Toxic Languages Grounded in the Perspective of Youth Yaqiong Li et.al. 2508.02094 translate read null
2025-08-03 DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition Peiyuan Jiang et.al. 2508.01644 translate read null
2025-08-02 A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics Rushin H. Gindra et.al. 2508.01490 translate read null
2025-08-02 AffectGPT-R1: Leveraging Reinforcement Learning for Open-Vocabulary Emotion Recognition Zheng Lian et.al. 2508.01318 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)