Multimodal - 2025-01

Publish Date Title Authors PDF Translate Read Code
2025-01-29 U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning Md Kaykobad Reza et.al. 2501.17823 translate read null
2025-01-28 Molecular-driven Foundation Model for Oncologic Pathology Anurag Vaidya et.al. 2501.16652 translate read null
2025-01-27 AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Zheng Lian et.al. 2501.16566 translate read null
2025-01-25 Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning Negin Hashemi Dijujin et.al. 2501.15270 translate read null
2025-01-25 Deep Multimodal Learning for Real-Time DDoS Attacks Detection in Internet of Vehicles Mohamed Ababsa et.al. 2501.15252 translate read link
2025-01-25 Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition Junwei Feng et.al. 2501.15063 translate read null
2025-01-23 Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Haomiao Xiong et.al. 2501.13468 translate read link
2025-01-22 EmoTech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information with Hybrid Recurrent Network Shamin Bin Habib Avro et.al. 2501.12674 translate read null
2025-01-21 Compositional Instruction Following with Language Models and Reinforcement Learning Vanya Cohen et.al. 2501.12539 translate read null
2025-01-21 Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET Fatih Aksu et.al. 2501.12425 translate read null
2025-01-20 LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations Soumya Dutta et.al. 2501.11468 translate read null
2025-01-20 ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction Xiangyang Hu et.al. 2501.11276 translate read link
2025-01-18 Fake Advertisements Detection Using Automated Multimodal Learning: A Case Study for Vietnamese Real Estate Data Duy Nguyen et.al. 2501.10848 translate read null
2025-01-17 A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features Enes Karanfil et.al. 2501.10144 translate read null
2025-01-17 TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation Vanessa Echeverria et.al. 2501.09930 translate read null
2025-01-19 IDEA: Image Description Enhanced CLIP-Adapter Zhipeng Ye et.al. 2501.08816 translate read link
2025-01-14 Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time Mihai Masala et.al. 2501.08460 translate read null
2025-01-12 SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval Bhavin Jawade et.al. 2501.08347 translate read null
2025-01-17 Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Liping Yuan et.al. 2501.07888 translate read null
2025-01-13 Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis Andrzej D. Dobrzycki et.al. 2501.07221 translate read null
2025-01-12 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes Mahmoud Ahmed et.al. 2501.06785 translate read link
2025-01-14 Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding Joshua Jones et.al. 2501.04693 translate read null
2025-01-06 CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets Tanay Agrawal et.al. 2501.03332 translate read null
2025-01-06 MVP: Multimodal Emotion Recognition based on Video and Physiological Signals Valeriya Strizhkova et.al. 2501.03103 translate read null
2025-01-02 Asymmetric Reinforcing against Multi-modal Representation Bias Xiyuan Gao et.al. 2501.01240 translate read link
2025-01-02 Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning Jian Lang et.al. 2501.01120 translate read link

(<a href=../Multimodal.md>back to Multimodal</a>)