Multimodal - 2025-02
Multimodal - 2025-02
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-02-28 | Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction | Wenrui Fan et.al. | 2503.00210 | translate | read | null |
| 2025-02-28 | PathVG: A New Benchmark and Dataset for Pathology Visual Grounding | Chunlin Zhong et.al. | 2502.20869 | translate | read | null |
| 2025-02-28 | Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems | Faisal Mohammad et.al. | 2502.20806 | translate | read | null |
| 2025-02-27 | VideoA11y: Method and Dataset for Accessible Video Description | Chaoyu Li et.al. | 2502.20480 | translate | read | null |
| 2025-02-27 | LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding | Ang Cao et.al. | 2502.20389 | translate | read | null |
| 2025-02-27 | Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion | QingYuan Jiang et.al. | 2502.20120 | translate | read | null |
| 2025-02-27 | MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification | Tong Zhang et.al. | 2502.19674 | translate | read | null |
| 2025-02-25 | CPVis: Evidence-based Multimodal Learning Analytics for Evaluation in Collaborative Programming | Gefei Zhang et.al. | 2502.17835 | translate | read | null |
| 2025-02-24 | Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Syed Abdul Gaffar Shakhadri et.al. | 2502.17092 | translate | read | null |
| 2025-02-24 | DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Ibrahim Fayad et.al. | 2502.17066 | translate | read | null |
| 2025-02-23 | Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models | Zitong Lu et.al. | 2502.16456 | translate | read | null |
| 2025-02-23 | A Survey on Industrial Anomalies Synthesis | Xichen Xu et.al. | 2502.16412 | translate | read | link |
| 2025-02-22 | Understanding the Emergence of Multimodal Representation Alignment | Megan Tjandrasuwita et.al. | 2502.16282 | translate | read | link |
| 2025-02-21 | M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards | Alvaro Becerra et.al. | 2502.15363 | translate | read | null |
| 2025-02-20 | FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis | Fadillah Maani et.al. | 2502.14807 | translate | read | link |
| 2025-02-21 | AVD2: Accident Video Diffusion for Accident Video Description | Cheng Li et.al. | 2502.14801 | translate | read | null |
| 2025-02-19 | Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition | Jingwang Huang et.al. | 2502.13954 | translate | read | link |
| 2025-02-22 | Grounding LLM Reasoning with Knowledge Graphs | Alfonso Amayuelas et.al. | 2502.13247 | translate | read | null |
| 2025-02-18 | SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Zekun Qi et.al. | 2502.13143 | translate | read | null |
| 2025-02-18 | Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning | Mengshi Qi et.al. | 2502.12425 | translate | read | link |
| 2025-02-16 | AudioSpa: Spatializing Sound Events with Text | Linfeng Feng et.al. | 2502.11219 | translate | read | null |
| 2025-02-18 | BalanceBenchmark: A Survey for Imbalanced Learning | Shaoxuan Xu et.al. | 2502.10816 | translate | read | link |
| 2025-02-17 | Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Mohammad Mahdi Abootorabi et.al. | 2502.08826 | translate | read | link |
| 2025-02-12 | A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion | Wei Dai et.al. | 2502.08573 | translate | read | null |
| 2025-02-17 | What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations | Dongqi Liu et.al. | 2502.08279 | translate | read | null |
| 2025-02-11 | Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis | Amir Hosein Fadaei et.al. | 2502.07277 | translate | read | null |
| 2025-02-10 | Generative Distribution Prediction: A Unified Approach to Multimodal Learning | Xinyu Tian et.al. | 2502.07090 | translate | read | null |
| 2025-02-06 | CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction | Jaewan Lee et.al. | 2502.06836 | translate | read | null |
| 2025-02-10 | Learning Musical Representations for Music Performance Question Answering | Xingjian Diao et.al. | 2502.06710 | translate | read | null |
| 2025-02-04 | Exploring Spatial Language Grounding Through Referring Expressions | Akshar Tumu et.al. | 2502.04359 | translate | read | null |
| 2025-02-03 | Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective | Xiaorui Ma et.al. | 2502.01524 | translate | read | null |
| 2025-02-03 | MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks | Alejandro Guerra-Manzanares et.al. | 2502.01158 | translate | read | null |
| 2025-02-01 | Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition | Zaitian Wang et.al. | 2502.00547 | translate | read | link |
(<a href=../Multimodal.md>back to Multimodal</a>)