Multimodal - 2025-01
Multimodal - 2025-01
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-01-29 | U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning | Md Kaykobad Reza et.al. | 2501.17823 | translate | read | null |
| 2025-01-28 | Molecular-driven Foundation Model for Oncologic Pathology | Anurag Vaidya et.al. | 2501.16652 | translate | read | null |
| 2025-01-27 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models | Zheng Lian et.al. | 2501.16566 | translate | read | null |
| 2025-01-25 | Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning | Negin Hashemi Dijujin et.al. | 2501.15270 | translate | read | null |
| 2025-01-25 | Deep Multimodal Learning for Real-Time DDoS Attacks Detection in Internet of Vehicles | Mohamed Ababsa et.al. | 2501.15252 | translate | read | link |
| 2025-01-25 | Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition | Junwei Feng et.al. | 2501.15063 | translate | read | null |
| 2025-01-23 | Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge | Haomiao Xiong et.al. | 2501.13468 | translate | read | link |
| 2025-01-22 | EmoTech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information with Hybrid Recurrent Network | Shamin Bin Habib Avro et.al. | 2501.12674 | translate | read | null |
| 2025-01-21 | Compositional Instruction Following with Language Models and Reinforcement Learning | Vanya Cohen et.al. | 2501.12539 | translate | read | null |
| 2025-01-21 | Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET | Fatih Aksu et.al. | 2501.12425 | translate | read | null |
| 2025-01-20 | LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations | Soumya Dutta et.al. | 2501.11468 | translate | read | null |
| 2025-01-20 | ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction | Xiangyang Hu et.al. | 2501.11276 | translate | read | link |
| 2025-01-18 | Fake Advertisements Detection Using Automated Multimodal Learning: A Case Study for Vietnamese Real Estate Data | Duy Nguyen et.al. | 2501.10848 | translate | read | null |
| 2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144 | translate | read | null |
| 2025-01-17 | TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation | Vanessa Echeverria et.al. | 2501.09930 | translate | read | null |
| 2025-01-19 | IDEA: Image Description Enhanced CLIP-Adapter | Zhipeng Ye et.al. | 2501.08816 | translate | read | link |
| 2025-01-14 | Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time | Mihai Masala et.al. | 2501.08460 | translate | read | null |
| 2025-01-12 | SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval | Bhavin Jawade et.al. | 2501.08347 | translate | read | null |
| 2025-01-17 | Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | Liping Yuan et.al. | 2501.07888 | translate | read | null |
| 2025-01-13 | Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis | Andrzej D. Dobrzycki et.al. | 2501.07221 | translate | read | null |
| 2025-01-12 | 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes | Mahmoud Ahmed et.al. | 2501.06785 | translate | read | link |
| 2025-01-14 | Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding | Joshua Jones et.al. | 2501.04693 | translate | read | null |
| 2025-01-06 | CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets | Tanay Agrawal et.al. | 2501.03332 | translate | read | null |
| 2025-01-06 | MVP: Multimodal Emotion Recognition based on Video and Physiological Signals | Valeriya Strizhkova et.al. | 2501.03103 | translate | read | null |
| 2025-01-02 | Asymmetric Reinforcing against Multi-modal Representation Bias | Xiyuan Gao et.al. | 2501.01240 | translate | read | link |
| 2025-01-02 | Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning | Jian Lang et.al. | 2501.01120 | translate | read | link |
(<a href=../Multimodal.md>back to Multimodal</a>)