Multimodal - 2025-04
Multimodal - 2025-04
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-04-30 | Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design | Vasudev Sharma et.al. | 2505.00134 | translate | read | null |
| 2025-04-28 | DEEMO: De-identity Multimodal Emotion Recognition and Reasoning | Deng Li et.al. | 2504.19549 | translate | read | null |
| 2025-04-27 | Platonic Grounding for Efficient Multimodal Language Models | Moulik Choraria et.al. | 2504.19327 | translate | read | null |
| 2025-04-27 | DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning | Jialang Lu et.al. | 2504.19127 | translate | read | null |
| 2025-04-23 | A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw | Wenwen Li et.al. | 2504.17822 | translate | read | null |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | translate | read | null |
| 2025-04-23 | Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation | Lakshita Agarwal et.al. | 2504.16788 | translate | read | null |
| 2025-04-23 | PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System | Xianghe Liu et.al. | 2504.16573 | translate | read | null |
| 2025-04-22 | CLIP-IT: CLIP-based Pairing for Histology Images Classification | Banafsheh Karimian et.al. | 2504.16181 | translate | read | null |
| 2025-04-22 | SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems | Manjunath D et.al. | 2504.15728 | translate | read | link |
| 2025-04-21 | Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models | Guo Chen et.al. | 2504.15271 | translate | read | link |
| 2025-04-21 | IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification | Fengyuan Nie et.al. | 2504.14833 | translate | read | null |
| 2025-04-19 | Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction | Li Yu et.al. | 2504.14267 | translate | read | null |
| 2025-04-19 | PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models | Nusrat Jahan Prottasha et.al. | 2504.14117 | translate | read | null |
| 2025-04-18 | Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation | Duy A. Nguyen et.al. | 2504.13465 | translate | read | null |
| 2025-04-17 | A Survey on Cross-Modal Interaction Between Music and Multimodal Data | Sifei Li et.al. | 2504.12796 | translate | read | null |
| 2025-04-16 | An Algebraic Extension of Intuitionistic Linear Logic: The $L_!^S$ -Calculus and Its Categorical Model | Alejandro Díaz-Caro et.al. | 2504.12128 | translate | read | null |
| 2025-04-16 | FedEPA: Enhancing Personalization and Modality Alignment in Multimodal Federated Learning | Yu Zhang et.al. | 2504.12025 | translate | read | null |
| 2025-04-15 | Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset | Elisa Ancarani et.al. | 2504.11232 | translate | read | null |
| 2025-04-14 | Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge | Maria Tzelepi et.al. | 2504.09914 | translate | read | null |
| 2025-04-13 | Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention | Vasilii Korolkov et.al. | 2504.09738 | translate | read | null |
| 2025-04-13 | Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | Yongchao Feng et.al. | 2504.09480 | translate | read | link |
| 2025-04-09 | Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging | Siyuan Dai et.al. | 2504.07336 | translate | read | null |
| 2025-04-07 | Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework | Yu Min Park et.al. | 2504.05187 | translate | read | null |
| 2025-04-07 | Leveraging Label Potential for Enhanced Multimodal Emotion Recognition | Xuechun Shao et.al. | 2504.05158 | translate | read | null |
| 2025-04-06 | FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency | Shiyan Liu et.al. | 2504.04427 | translate | read | null |
| 2025-04-04 | Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives | Xiaokun Liu et.al. | 2504.03847 | translate | read | null |
| 2025-04-04 | DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models | Sathish Kumar et.al. | 2504.03423 | translate | read | null |
| 2025-04-02 | Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Jing Liu et.al. | 2504.01954 | translate | read | null |
| 2025-04-02 | Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications | Wanqing Yang et.al. | 2504.01490 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)