Multimodal - 2025-02

Publish Date Title Authors PDF Translate Read Code
2025-02-28 Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction Wenrui Fan et.al. 2503.00210 translate read null
2025-02-28 PathVG: A New Benchmark and Dataset for Pathology Visual Grounding Chunlin Zhong et.al. 2502.20869 translate read null
2025-02-28 Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems Faisal Mohammad et.al. 2502.20806 translate read null
2025-02-27 VideoA11y: Method and Dataset for Accessible Video Description Chaoyu Li et.al. 2502.20480 translate read null
2025-02-27 LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding Ang Cao et.al. 2502.20389 translate read null
2025-02-27 Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion QingYuan Jiang et.al. 2502.20120 translate read null
2025-02-27 MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification Tong Zhang et.al. 2502.19674 translate read null
2025-02-25 CPVis: Evidence-based Multimodal Learning Analytics for Evaluation in Collaborative Programming Gefei Zhang et.al. 2502.17835 translate read null
2025-02-24 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI Syed Abdul Gaffar Shakhadri et.al. 2502.17092 translate read null
2025-02-24 DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications Ibrahim Fayad et.al. 2502.17066 translate read null
2025-02-23 Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models Zitong Lu et.al. 2502.16456 translate read null
2025-02-23 A Survey on Industrial Anomalies Synthesis Xichen Xu et.al. 2502.16412 translate read link
2025-02-22 Understanding the Emergence of Multimodal Representation Alignment Megan Tjandrasuwita et.al. 2502.16282 translate read link
2025-02-21 M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards Alvaro Becerra et.al. 2502.15363 translate read null
2025-02-20 FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis Fadillah Maani et.al. 2502.14807 translate read link
2025-02-21 AVD2: Accident Video Diffusion for Accident Video Description Cheng Li et.al. 2502.14801 translate read null
2025-02-19 Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition Jingwang Huang et.al. 2502.13954 translate read link
2025-02-22 Grounding LLM Reasoning with Knowledge Graphs Alfonso Amayuelas et.al. 2502.13247 translate read null
2025-02-18 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Zekun Qi et.al. 2502.13143 translate read null
2025-02-18 Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning Mengshi Qi et.al. 2502.12425 translate read link
2025-02-16 AudioSpa: Spatializing Sound Events with Text Linfeng Feng et.al. 2502.11219 translate read null
2025-02-18 BalanceBenchmark: A Survey for Imbalanced Learning Shaoxuan Xu et.al. 2502.10816 translate read link
2025-02-17 Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Mohammad Mahdi Abootorabi et.al. 2502.08826 translate read link
2025-02-12 A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion Wei Dai et.al. 2502.08573 translate read null
2025-02-17 What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations Dongqi Liu et.al. 2502.08279 translate read null
2025-02-11 Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Amir Hosein Fadaei et.al. 2502.07277 translate read null
2025-02-10 Generative Distribution Prediction: A Unified Approach to Multimodal Learning Xinyu Tian et.al. 2502.07090 translate read null
2025-02-06 CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction Jaewan Lee et.al. 2502.06836 translate read null
2025-02-10 Learning Musical Representations for Music Performance Question Answering Xingjian Diao et.al. 2502.06710 translate read null
2025-02-04 Exploring Spatial Language Grounding Through Referring Expressions Akshar Tumu et.al. 2502.04359 translate read null
2025-02-03 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective Xiaorui Ma et.al. 2502.01524 translate read null
2025-02-03 MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks Alejandro Guerra-Manzanares et.al. 2502.01158 translate read null
2025-02-01 Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition Zaitian Wang et.al. 2502.00547 translate read link

(<a href=../Multimodal.md>back to Multimodal</a>)