Multimodal - 2025-10 | Paper Arxiv Daily

Multimodal - 2025-10

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-10-24	Multimodal Detection of Fake Reviews using BERT and ResNet-50	Suhasnadh Reddy Veluru et.al.	2511.00020	translate	read	null
2025-10-04	Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment	Adrian-Dinu Urse et.al.	2511.00004	translate	read	null
2025-10-31	MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data	Yu-Chen Kuo et.al.	2510.27321	translate	read	null
2025-10-30	Evaluating Perspectival Biases in Cross-Modal Retrieval	Teerapol Saengsukhiran et.al.	2510.26861	translate	read	null
2025-10-30	Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise	Zijing Xu et.al.	2510.26289	translate	read	null
2025-10-29	Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start	Kun Chen et.al.	2510.25801	translate	read	null
2025-10-29	LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation	Yang Miao et.al.	2510.25263	translate	read	null
2025-10-29	H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts	Peilin Tan et.al.	2510.25091	translate	read	null
2025-10-29	Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments	Manjunath Prasad Holenarasipura Rajiv et.al.	2510.25070	translate	read	null
2025-10-28	Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning	Hossein R. Nowdeh et.al.	2510.24919	translate	read	null
2025-10-28	MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition	Haoyang Zhang et.al.	2510.24827	translate	read	null
2025-10-24	Towards Fine-Grained Human Motion Video Captioning	Guorui Song et.al.	2510.24767	translate	read	null
2025-10-27	Toward Clinically Grounded Foundation Models in Pathology	Hamid R. Tizhoosh et.al.	2510.23807	translate	read	null
2025-10-27	Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier	Hyeongseop Rha et.al.	2510.23506	translate	read	null
2025-10-27	Evaluation of Vision-LLMs in Surveillance Video	Pascal Benschop et.al.	2510.23190	translate	read	null
2025-10-21	Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation	Chanyoung Chung et.al.	2510.21812	translate	read	null
2025-10-07	Avi: Action from Volumetric Inference	Harris Song et.al.	2510.21746	translate	read	null
2025-10-24	CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis	Yiming Tang et.al.	2510.21464	translate	read	null
2025-10-24	Bridging the gap to real-world language-grounded visual concept learning	Whie Jung et.al.	2510.21412	translate	read	null
2025-10-23	Multimodal Negative Learning	Baoquan Gong et.al.	2510.20877	translate	read	null
2025-10-23	Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process	Tsai Hor Chan et.al.	2510.20736	translate	read	null
2025-10-23	Calibrating Multimodal Consensus for Emotion Recognition	Guowei Zhong et.al.	2510.20256	translate	read	null
2025-10-22	Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment	Yuhang Liu et.al.	2510.19384	translate	read	null
2025-10-22	FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation	Chirag Padubidri et.al.	2510.19305	translate	read	null
2025-10-21	Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents	Yiqi Lin et.al.	2510.18703	translate	read	null
2025-10-21	Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation	Yasser Hamidullah et.al.	2510.18439	translate	read	null
2025-10-20	Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware	Stavros Mitsis et.al.	2510.18036	translate	read	null
2025-10-20	MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning	Alejandro Guerra-Manzanares et.al.	2510.17394	translate	read	null
2025-10-19	Graph4MM: Weaving Multimodal Learning with Structural Information	Xuying Ning et.al.	2510.16990	translate	read	null
2025-10-19	ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning	Yingxu Wang et.al.	2510.16824	translate	read	null
2025-10-19	Pursuing Minimal Sufficiency in Spatial Reasoning	Yejie Guo et.al.	2510.16688	translate	read	null
2025-10-18	Safire: Similarity Framework for Visualization Retrieval	Huyen N. Nguyen et.al.	2510.16662	translate	read	null
2025-10-18	Structured Interfaces for Automated Reasoning with 3D Scene Graphs	Aaron Ray et.al.	2510.16643	translate	read	null
2025-10-09	Lyapunov-Stable Adaptive Control for Multimodal Concept Drift	Tianyu Bell Pan et.al.	2510.15944	translate	read	null
2025-10-17	Towards Relaxed Multimodal Inputs for Gait-based Parkinson’s Disease Assessment	Minlin Zeng et.al.	2510.15748	translate	read	null
2025-10-17	ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings	Prithwish Jana et.al.	2510.15681	translate	read	null
2025-10-16	ChangingGrounding: 3D Visual Grounding in Changing Scenes	Miao Hu et.al.	2510.14965	translate	read	null
2025-10-16	From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance	Zhe Li et.al.	2510.14952	translate	read	null
2025-10-16	Revisit Modality Imbalance at the Decision Layer	Xiaoyu Ma et.al.	2510.14411	translate	read	null
2025-10-15	A Multimodal Approach to Heritage Preservation in the Context of Climate Change	David Roqui et.al.	2510.14136	translate	read	null
2025-10-15	NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching	Run Luo et.al.	2510.13721	translate	read	null
2025-10-15	Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation	Jiamin Chen et.al.	2510.13191	translate	read	null
2025-10-15	Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning	Rongrong Xie et.al.	2510.13182	translate	read	null
2025-10-15	OS-HGAdapter: Open Semantic Hypergraph Adapter for Large Language Models Assisted Entropy-Enhanced Image-Text Alignment	Rongjun Chen et.al.	2510.13131	translate	read	null
2025-10-14	A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation	Shurong Chai et.al.	2510.12482	translate	read	null
2025-10-14	Ground Stratification for a Logic of Definitions with Induction	Nathan Guermond et.al.	2510.12297	translate	read	null
2025-10-14	IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation	Wenxu Zhou et.al.	2510.12095	translate	read	null
2025-10-13	Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis	Blessing Agyei Kyem et.al.	2510.11907	translate	read	null
2025-10-11	Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model	Chung-Soo Ahn et.al.	2510.10078	translate	read	null
2025-10-10	Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition	Huimin Liu et.al.	2510.09203	translate	read	null
2025-10-09	Provably Robust Adaptation for Language-Empowered Foundation Models	Yuni Lai et.al.	2510.08659	translate	read	null
2025-10-07	Centering Emotion Hotspots: Multimodal Local-Global Fusion and Cross-Modal Alignment for Emotion Recognition in Conversations	Yu Liu et.al.	2510.08606	translate	read	null
2025-10-09	Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling	Bianca-Mihaela Ganescu et.al.	2510.08470	translate	read	null
2025-10-08	FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams	Corban Rivera et.al.	2510.07417	translate	read	null
2025-10-08	TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation	Jiaben Chen et.al.	2510.07249	translate	read	null
2025-10-08	Expressive and Scalable Quantum Fusion for Multimodal Learning	Tuyen Nguyen et.al.	2510.06938	translate	read	null
2025-10-08	M3Retrieve: Benchmarking Multimodal Retrieval for Medicine	Arkadeep Acharya et.al.	2510.06888	translate	read	null
2025-10-07	Deforming Videos to Masks: Flow Matching for Referring Video Segmentation	Zanyi Wang et.al.	2510.06139	translate	read	null
2025-10-05	Learning-Based Hashing for ANN Search: Foundations and Early Advances	Sean Moran et.al.	2510.04127	translate	read	null
2025-10-04	UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG	Xiangyu Peng et.al.	2510.03663	translate	read	null
2025-10-04	Towards Unsupervised Speech Recognition at the Syllable-Level	Liming Wang et.al.	2510.03639	translate	read	null
2025-10-02	Latency-aware Multimodal Federated Learning over UAV Networks	Shaba Shaon et.al.	2510.01717	translate	read	null
2025-10-01	PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset	Thomas Campagnolo et.al.	2510.00818	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)