Multimodal - 2025-04

Publish Date Title Authors PDF Translate Read Code
2025-04-30 Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design Vasudev Sharma et.al. 2505.00134 translate read null
2025-04-28 DEEMO: De-identity Multimodal Emotion Recognition and Reasoning Deng Li et.al. 2504.19549 translate read null
2025-04-27 Platonic Grounding for Efficient Multimodal Language Models Moulik Choraria et.al. 2504.19327 translate read null
2025-04-27 DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning Jialang Lu et.al. 2504.19127 translate read null
2025-04-23 A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw Wenwen Li et.al. 2504.17822 translate read null
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 translate read null
2025-04-23 Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation Lakshita Agarwal et.al. 2504.16788 translate read null
2025-04-23 PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System Xianghe Liu et.al. 2504.16573 translate read null
2025-04-22 CLIP-IT: CLIP-based Pairing for Histology Images Classification Banafsheh Karimian et.al. 2504.16181 translate read null
2025-04-22 SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems Manjunath D et.al. 2504.15728 translate read link
2025-04-21 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Guo Chen et.al. 2504.15271 translate read link
2025-04-21 IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification Fengyuan Nie et.al. 2504.14833 translate read null
2025-04-19 Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction Li Yu et.al. 2504.14267 translate read null
2025-04-19 PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models Nusrat Jahan Prottasha et.al. 2504.14117 translate read null
2025-04-18 Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation Duy A. Nguyen et.al. 2504.13465 translate read null
2025-04-17 A Survey on Cross-Modal Interaction Between Music and Multimodal Data Sifei Li et.al. 2504.12796 translate read null
2025-04-16 An Algebraic Extension of Intuitionistic Linear Logic: The $L_!^S$ -Calculus and Its Categorical Model Alejandro Díaz-Caro et.al. 2504.12128 translate read null
2025-04-16 FedEPA: Enhancing Personalization and Modality Alignment in Multimodal Federated Learning Yu Zhang et.al. 2504.12025 translate read null
2025-04-15 Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset Elisa Ancarani et.al. 2504.11232 translate read null
2025-04-14 Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge Maria Tzelepi et.al. 2504.09914 translate read null
2025-04-13 Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention Vasilii Korolkov et.al. 2504.09738 translate read null
2025-04-13 Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation Yongchao Feng et.al. 2504.09480 translate read link
2025-04-09 Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging Siyuan Dai et.al. 2504.07336 translate read null
2025-04-07 Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework Yu Min Park et.al. 2504.05187 translate read null
2025-04-07 Leveraging Label Potential for Enhanced Multimodal Emotion Recognition Xuechun Shao et.al. 2504.05158 translate read null
2025-04-06 FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency Shiyan Liu et.al. 2504.04427 translate read null
2025-04-04 Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives Xiaokun Liu et.al. 2504.03847 translate read null
2025-04-04 DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models Sathish Kumar et.al. 2504.03423 translate read null
2025-04-02 Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities Jing Liu et.al. 2504.01954 translate read null
2025-04-02 Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications Wanqing Yang et.al. 2504.01490 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)