Multimodal - 2025-04 | Paper Arxiv Daily

Multimodal - 2025-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-04-30	Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design	Vasudev Sharma et.al.	2505.00134	translate	read	null
2025-04-28	DEEMO: De-identity Multimodal Emotion Recognition and Reasoning	Deng Li et.al.	2504.19549	translate	read	null
2025-04-27	Platonic Grounding for Efficient Multimodal Language Models	Moulik Choraria et.al.	2504.19327	translate	read	null
2025-04-27	DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning	Jialang Lu et.al.	2504.19127	translate	read	null
2025-04-23	A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw	Wenwen Li et.al.	2504.17822	translate	read	null
2025-04-23	Monte Carlo Planning with Large Language Model for Text-Based Game Agents	Zijing Shi et.al.	2504.16855	translate	read	null
2025-04-23	Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation	Lakshita Agarwal et.al.	2504.16788	translate	read	null
2025-04-23	PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System	Xianghe Liu et.al.	2504.16573	translate	read	null
2025-04-22	CLIP-IT: CLIP-based Pairing for Histology Images Classification	Banafsheh Karimian et.al.	2504.16181	translate	read	null
2025-04-22	SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems	Manjunath D et.al.	2504.15728	translate	read	link
2025-04-21	Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Guo Chen et.al.	2504.15271	translate	read	link
2025-04-21	IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification	Fengyuan Nie et.al.	2504.14833	translate	read	null
2025-04-19	Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction	Li Yu et.al.	2504.14267	translate	read	null
2025-04-19	PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models	Nusrat Jahan Prottasha et.al.	2504.14117	translate	read	null
2025-04-18	Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation	Duy A. Nguyen et.al.	2504.13465	translate	read	null
2025-04-17	A Survey on Cross-Modal Interaction Between Music and Multimodal Data	Sifei Li et.al.	2504.12796	translate	read	null
2025-04-16	An Algebraic Extension of Intuitionistic Linear Logic: The $L_!^S$ -Calculus and Its Categorical Model	Alejandro Díaz-Caro et.al.	2504.12128	translate	read	null
2025-04-16	FedEPA: Enhancing Personalization and Modality Alignment in Multimodal Federated Learning	Yu Zhang et.al.	2504.12025	translate	read	null
2025-04-15	Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset	Elisa Ancarani et.al.	2504.11232	translate	read	null
2025-04-14	Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge	Maria Tzelepi et.al.	2504.09914	translate	read	null
2025-04-13	Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention	Vasilii Korolkov et.al.	2504.09738	translate	read	null
2025-04-13	Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation	Yongchao Feng et.al.	2504.09480	translate	read	link
2025-04-09	Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging	Siyuan Dai et.al.	2504.07336	translate	read	null
2025-04-07	Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework	Yu Min Park et.al.	2504.05187	translate	read	null
2025-04-07	Leveraging Label Potential for Enhanced Multimodal Emotion Recognition	Xuechun Shao et.al.	2504.05158	translate	read	null
2025-04-06	FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency	Shiyan Liu et.al.	2504.04427	translate	read	null
2025-04-04	Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives	Xiaokun Liu et.al.	2504.03847	translate	read	null
2025-04-04	DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models	Sathish Kumar et.al.	2504.03423	translate	read	null
2025-04-02	Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities	Jing Liu et.al.	2504.01954	translate	read	null
2025-04-02	Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications	Wanqing Yang et.al.	2504.01490	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)