Multimodal - 2025-10

Publish Date Title Authors PDF Translate Read Code
2025-10-24 Multimodal Detection of Fake Reviews using BERT and ResNet-50 Suhasnadh Reddy Veluru et.al. 2511.00020 translate read null
2025-10-04 Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment Adrian-Dinu Urse et.al. 2511.00004 translate read null
2025-10-31 MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data Yu-Chen Kuo et.al. 2510.27321 translate read null
2025-10-30 Evaluating Perspectival Biases in Cross-Modal Retrieval Teerapol Saengsukhiran et.al. 2510.26861 translate read null
2025-10-30 Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise Zijing Xu et.al. 2510.26289 translate read null
2025-10-29 Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start Kun Chen et.al. 2510.25801 translate read null
2025-10-29 LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation Yang Miao et.al. 2510.25263 translate read null
2025-10-29 H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts Peilin Tan et.al. 2510.25091 translate read null
2025-10-29 Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments Manjunath Prasad Holenarasipura Rajiv et.al. 2510.25070 translate read null
2025-10-28 Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning Hossein R. Nowdeh et.al. 2510.24919 translate read null
2025-10-28 MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition Haoyang Zhang et.al. 2510.24827 translate read null
2025-10-24 Towards Fine-Grained Human Motion Video Captioning Guorui Song et.al. 2510.24767 translate read null
2025-10-27 Toward Clinically Grounded Foundation Models in Pathology Hamid R. Tizhoosh et.al. 2510.23807 translate read null
2025-10-27 Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier Hyeongseop Rha et.al. 2510.23506 translate read null
2025-10-27 Evaluation of Vision-LLMs in Surveillance Video Pascal Benschop et.al. 2510.23190 translate read null
2025-10-21 Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation Chanyoung Chung et.al. 2510.21812 translate read null
2025-10-07 Avi: Action from Volumetric Inference Harris Song et.al. 2510.21746 translate read null
2025-10-24 CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis Yiming Tang et.al. 2510.21464 translate read null
2025-10-24 Bridging the gap to real-world language-grounded visual concept learning Whie Jung et.al. 2510.21412 translate read null
2025-10-23 Multimodal Negative Learning Baoquan Gong et.al. 2510.20877 translate read null
2025-10-23 Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process Tsai Hor Chan et.al. 2510.20736 translate read null
2025-10-23 Calibrating Multimodal Consensus for Emotion Recognition Guowei Zhong et.al. 2510.20256 translate read null
2025-10-22 Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment Yuhang Liu et.al. 2510.19384 translate read null
2025-10-22 FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation Chirag Padubidri et.al. 2510.19305 translate read null
2025-10-21 Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents Yiqi Lin et.al. 2510.18703 translate read null
2025-10-21 Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation Yasser Hamidullah et.al. 2510.18439 translate read null
2025-10-20 Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware Stavros Mitsis et.al. 2510.18036 translate read null
2025-10-20 MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning Alejandro Guerra-Manzanares et.al. 2510.17394 translate read null
2025-10-19 Graph4MM: Weaving Multimodal Learning with Structural Information Xuying Ning et.al. 2510.16990 translate read null
2025-10-19 ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning Yingxu Wang et.al. 2510.16824 translate read null
2025-10-19 Pursuing Minimal Sufficiency in Spatial Reasoning Yejie Guo et.al. 2510.16688 translate read null
2025-10-18 Safire: Similarity Framework for Visualization Retrieval Huyen N. Nguyen et.al. 2510.16662 translate read null
2025-10-18 Structured Interfaces for Automated Reasoning with 3D Scene Graphs Aaron Ray et.al. 2510.16643 translate read null
2025-10-09 Lyapunov-Stable Adaptive Control for Multimodal Concept Drift Tianyu Bell Pan et.al. 2510.15944 translate read null
2025-10-17 Towards Relaxed Multimodal Inputs for Gait-based Parkinson’s Disease Assessment Minlin Zeng et.al. 2510.15748 translate read null
2025-10-17 ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings Prithwish Jana et.al. 2510.15681 translate read null
2025-10-16 ChangingGrounding: 3D Visual Grounding in Changing Scenes Miao Hu et.al. 2510.14965 translate read null
2025-10-16 From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Zhe Li et.al. 2510.14952 translate read null
2025-10-16 Revisit Modality Imbalance at the Decision Layer Xiaoyu Ma et.al. 2510.14411 translate read null
2025-10-15 A Multimodal Approach to Heritage Preservation in the Context of Climate Change David Roqui et.al. 2510.14136 translate read null
2025-10-15 NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching Run Luo et.al. 2510.13721 translate read null
2025-10-15 Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation Jiamin Chen et.al. 2510.13191 translate read null
2025-10-15 Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning Rongrong Xie et.al. 2510.13182 translate read null
2025-10-15 OS-HGAdapter: Open Semantic Hypergraph Adapter for Large Language Models Assisted Entropy-Enhanced Image-Text Alignment Rongjun Chen et.al. 2510.13131 translate read null
2025-10-14 A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation Shurong Chai et.al. 2510.12482 translate read null
2025-10-14 Ground Stratification for a Logic of Definitions with Induction Nathan Guermond et.al. 2510.12297 translate read null
2025-10-14 IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation Wenxu Zhou et.al. 2510.12095 translate read null
2025-10-13 Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis Blessing Agyei Kyem et.al. 2510.11907 translate read null
2025-10-11 Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model Chung-Soo Ahn et.al. 2510.10078 translate read null
2025-10-10 Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition Huimin Liu et.al. 2510.09203 translate read null
2025-10-09 Provably Robust Adaptation for Language-Empowered Foundation Models Yuni Lai et.al. 2510.08659 translate read null
2025-10-07 Centering Emotion Hotspots: Multimodal Local-Global Fusion and Cross-Modal Alignment for Emotion Recognition in Conversations Yu Liu et.al. 2510.08606 translate read null
2025-10-09 Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling Bianca-Mihaela Ganescu et.al. 2510.08470 translate read null
2025-10-08 FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams Corban Rivera et.al. 2510.07417 translate read null
2025-10-08 TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation Jiaben Chen et.al. 2510.07249 translate read null
2025-10-08 Expressive and Scalable Quantum Fusion for Multimodal Learning Tuyen Nguyen et.al. 2510.06938 translate read null
2025-10-08 M3Retrieve: Benchmarking Multimodal Retrieval for Medicine Arkadeep Acharya et.al. 2510.06888 translate read null
2025-10-07 Deforming Videos to Masks: Flow Matching for Referring Video Segmentation Zanyi Wang et.al. 2510.06139 translate read null
2025-10-05 Learning-Based Hashing for ANN Search: Foundations and Early Advances Sean Moran et.al. 2510.04127 translate read null
2025-10-04 UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG Xiangyu Peng et.al. 2510.03663 translate read null
2025-10-04 Towards Unsupervised Speech Recognition at the Syllable-Level Liming Wang et.al. 2510.03639 translate read null
2025-10-02 Latency-aware Multimodal Federated Learning over UAV Networks Shaba Shaon et.al. 2510.01717 translate read null
2025-10-01 PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset Thomas Campagnolo et.al. 2510.00818 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)