Multimodal - 2025-07

Publish Date Title Authors PDF Translate Read Code
2025-07-29 Multimodal Video Emotion Recognition with Reliable Reasoning Priors Zhepeng Wang et.al. 2508.03722 translate read null
2025-07-29 SmartCLIP: Modular Vision-language Alignment with Identification Guarantees Shaoan Xie et.al. 2507.22264 translate read null
2025-07-29 MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces Shaojun E et.al. 2507.21741 translate read link
2025-07-29 Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion Zeyu Deng et.al. 2507.21395 translate read null
2025-07-28 On the Limits of Hierarchically Embedded Logic in Classical Neural Networks Bill Cochran et.al. 2507.20960 translate read null
2025-07-28 TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model Ao Li et.al. 2507.20630 translate read null
2025-07-25 Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization Hsuan-Yu Wang et.al. 2507.19356 translate read null
2025-07-25 SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality Sijie Li et.al. 2507.19264 translate read null
2025-07-24 Deep Learning for Blood-Brain Barrier Permeability Prediction Zihan Yang et.al. 2507.18557 translate read null
2025-07-23 RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding Xi Xiao et.al. 2507.17353 translate read null
2025-07-22 VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings Ramin Giahi et.al. 2507.17080 translate read null
2025-07-20 TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning Jie He et.al. 2507.16844 translate read null
2025-07-21 Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure Alexandra Junell et.al. 2507.16088 translate read null
2025-07-21 MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations Deyun Zhang et.al. 2507.15255 translate read null
2025-07-20 LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering Xinxin Dong et.al. 2507.14784 translate read null
2025-07-18 MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training Yuechen Xie et.al. 2507.13673 translate read null
2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Penglei Sun et.al. 2507.12795 translate read null
2025-07-17 A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models Weijieying Ren et.al. 2507.12774 translate read null
2025-07-15 Partitioner Guided Modal Learning Framework Guimin Hu et.al. 2507.11661 translate read null
2025-07-15 A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition Xinkui Zhao et.al. 2507.11202 translate read null
2025-07-14 Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language Andrew C. Li et.al. 2507.10741 translate read null
2025-07-14 Boosting Multimodal Learning via Disentangled Gradient Learning Shicai Wei et.al. 2507.10213 translate read null
2025-07-21 Improving Multimodal Learning via Imbalanced Learning Shicai Wei et.al. 2507.10203 translate read link
2025-07-13 HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space Changli Wang et.al. 2507.09487 translate read null
2025-07-09 Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors Chunzhuo Wang et.al. 2507.07261 translate read null
2025-07-09 Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey Getamesay Haile Dagnaw et.al. 2507.07148 translate read null
2025-07-08 Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration Maximilian Tschuchnig et.al. 2507.06067 translate read null
2025-07-08 Graph Learning Feng Xia et.al. 2507.05636 translate read null
2025-07-07 Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models Eunseop Yoon et.al. 2507.04976 translate read null
2025-07-07 From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach Mihai Masala et.al. 2507.04815 translate read null
2025-07-07 MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding Zhicheng Zhang et.al. 2507.04635 translate read null
2025-07-10 DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth Zheng Lian et.al. 2507.04278 translate read null
2025-07-05 Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation Fernando Gabriela Garcia et.al. 2507.04151 translate read null
2025-07-03 Intelligent Histology for Tumor Neurosurgery Xinhai Hou et.al. 2507.03037 translate read null
2025-07-01 Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers Yusuf Shihata et.al. 2507.02985 translate read null
2025-07-02 TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation Yubeen Lee et.al. 2507.02080 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)