Multimodal - 2025-01 | Paper Arxiv Daily

Multimodal - 2025-01

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-01-29	U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning	Md Kaykobad Reza et.al.	2501.17823	translate	read	null
2025-01-28	Molecular-driven Foundation Model for Oncologic Pathology	Anurag Vaidya et.al.	2501.16652	translate	read	null
2025-01-27	AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models	Zheng Lian et.al.	2501.16566	translate	read	null
2025-01-25	Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning	Negin Hashemi Dijujin et.al.	2501.15270	translate	read	null
2025-01-25	Deep Multimodal Learning for Real-Time DDoS Attacks Detection in Internet of Vehicles	Mohamed Ababsa et.al.	2501.15252	translate	read	link
2025-01-25	Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition	Junwei Feng et.al.	2501.15063	translate	read	null
2025-01-23	Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge	Haomiao Xiong et.al.	2501.13468	translate	read	link
2025-01-22	EmoTech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information with Hybrid Recurrent Network	Shamin Bin Habib Avro et.al.	2501.12674	translate	read	null
2025-01-21	Compositional Instruction Following with Language Models and Reinforcement Learning	Vanya Cohen et.al.	2501.12539	translate	read	null
2025-01-21	Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET	Fatih Aksu et.al.	2501.12425	translate	read	null
2025-01-20	LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations	Soumya Dutta et.al.	2501.11468	translate	read	null
2025-01-20	ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction	Xiangyang Hu et.al.	2501.11276	translate	read	link
2025-01-18	Fake Advertisements Detection Using Automated Multimodal Learning: A Case Study for Vietnamese Real Estate Data	Duy Nguyen et.al.	2501.10848	translate	read	null
2025-01-17	A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features	Enes Karanfil et.al.	2501.10144	translate	read	null
2025-01-17	TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation	Vanessa Echeverria et.al.	2501.09930	translate	read	null
2025-01-19	IDEA: Image Description Enhanced CLIP-Adapter	Zhipeng Ye et.al.	2501.08816	translate	read	link
2025-01-14	Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time	Mihai Masala et.al.	2501.08460	translate	read	null
2025-01-12	SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval	Bhavin Jawade et.al.	2501.08347	translate	read	null
2025-01-17	Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding	Liping Yuan et.al.	2501.07888	translate	read	null
2025-01-13	Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis	Andrzej D. Dobrzycki et.al.	2501.07221	translate	read	null
2025-01-12	3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes	Mahmoud Ahmed et.al.	2501.06785	translate	read	link
2025-01-14	Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding	Joshua Jones et.al.	2501.04693	translate	read	null
2025-01-06	CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets	Tanay Agrawal et.al.	2501.03332	translate	read	null
2025-01-06	MVP: Multimodal Emotion Recognition based on Video and Physiological Signals	Valeriya Strizhkova et.al.	2501.03103	translate	read	null
2025-01-02	Asymmetric Reinforcing against Multi-modal Representation Bias	Xiyuan Gao et.al.	2501.01240	translate	read	link
2025-01-02	Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning	Jian Lang et.al.	2501.01120	translate	read	link

(<a href=../Multimodal.md>back to Multimodal</a>)