Multimodal - 2025-02 | Paper Arxiv Daily

Multimodal - 2025-02

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-02-28	Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction	Wenrui Fan et.al.	2503.00210	translate	read	null
2025-02-28	PathVG: A New Benchmark and Dataset for Pathology Visual Grounding	Chunlin Zhong et.al.	2502.20869	translate	read	null
2025-02-28	Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems	Faisal Mohammad et.al.	2502.20806	translate	read	null
2025-02-27	VideoA11y: Method and Dataset for Accessible Video Description	Chaoyu Li et.al.	2502.20480	translate	read	null
2025-02-27	LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding	Ang Cao et.al.	2502.20389	translate	read	null
2025-02-27	Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion	QingYuan Jiang et.al.	2502.20120	translate	read	null
2025-02-27	MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification	Tong Zhang et.al.	2502.19674	translate	read	null
2025-02-25	CPVis: Evidence-based Multimodal Learning Analytics for Evaluation in Collaborative Programming	Gefei Zhang et.al.	2502.17835	translate	read	null
2025-02-24	Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Syed Abdul Gaffar Shakhadri et.al.	2502.17092	translate	read	null
2025-02-24	DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications	Ibrahim Fayad et.al.	2502.17066	translate	read	null
2025-02-23	Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models	Zitong Lu et.al.	2502.16456	translate	read	null
2025-02-23	A Survey on Industrial Anomalies Synthesis	Xichen Xu et.al.	2502.16412	translate	read	link
2025-02-22	Understanding the Emergence of Multimodal Representation Alignment	Megan Tjandrasuwita et.al.	2502.16282	translate	read	link
2025-02-21	M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards	Alvaro Becerra et.al.	2502.15363	translate	read	null
2025-02-20	FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis	Fadillah Maani et.al.	2502.14807	translate	read	link
2025-02-21	AVD2: Accident Video Diffusion for Accident Video Description	Cheng Li et.al.	2502.14801	translate	read	null
2025-02-19	Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition	Jingwang Huang et.al.	2502.13954	translate	read	link
2025-02-22	Grounding LLM Reasoning with Knowledge Graphs	Alfonso Amayuelas et.al.	2502.13247	translate	read	null
2025-02-18	SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation	Zekun Qi et.al.	2502.13143	translate	read	null
2025-02-18	Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning	Mengshi Qi et.al.	2502.12425	translate	read	link
2025-02-16	AudioSpa: Spatializing Sound Events with Text	Linfeng Feng et.al.	2502.11219	translate	read	null
2025-02-18	BalanceBenchmark: A Survey for Imbalanced Learning	Shaoxuan Xu et.al.	2502.10816	translate	read	link
2025-02-17	Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation	Mohammad Mahdi Abootorabi et.al.	2502.08826	translate	read	link
2025-02-12	A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion	Wei Dai et.al.	2502.08573	translate	read	null
2025-02-17	What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations	Dongqi Liu et.al.	2502.08279	translate	read	null
2025-02-11	Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis	Amir Hosein Fadaei et.al.	2502.07277	translate	read	null
2025-02-10	Generative Distribution Prediction: A Unified Approach to Multimodal Learning	Xinyu Tian et.al.	2502.07090	translate	read	null
2025-02-06	CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction	Jaewan Lee et.al.	2502.06836	translate	read	null
2025-02-10	Learning Musical Representations for Music Performance Question Answering	Xingjian Diao et.al.	2502.06710	translate	read	null
2025-02-04	Exploring Spatial Language Grounding Through Referring Expressions	Akshar Tumu et.al.	2502.04359	translate	read	null
2025-02-03	Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective	Xiaorui Ma et.al.	2502.01524	translate	read	null
2025-02-03	MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks	Alejandro Guerra-Manzanares et.al.	2502.01158	translate	read	null
2025-02-01	Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition	Zaitian Wang et.al.	2502.00547	translate	read	link

(<a href=../Multimodal.md>back to Multimodal</a>)