Multimodal - 2025-07
Multimodal - 2025-07
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-07-29 | Multimodal Video Emotion Recognition with Reliable Reasoning Priors | Zhepeng Wang et.al. | 2508.03722 | translate | read | null |
| 2025-07-29 | SmartCLIP: Modular Vision-language Alignment with Identification Guarantees | Shaoan Xie et.al. | 2507.22264 | translate | read | null |
| 2025-07-29 | MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces | Shaojun E et.al. | 2507.21741 | translate | read | link |
| 2025-07-29 | Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion | Zeyu Deng et.al. | 2507.21395 | translate | read | null |
| 2025-07-28 | On the Limits of Hierarchically Embedded Logic in Classical Neural Networks | Bill Cochran et.al. | 2507.20960 | translate | read | null |
| 2025-07-28 | TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model | Ao Li et.al. | 2507.20630 | translate | read | null |
| 2025-07-25 | Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization | Hsuan-Yu Wang et.al. | 2507.19356 | translate | read | null |
| 2025-07-25 | SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality | Sijie Li et.al. | 2507.19264 | translate | read | null |
| 2025-07-24 | Deep Learning for Blood-Brain Barrier Permeability Prediction | Zihan Yang et.al. | 2507.18557 | translate | read | null |
| 2025-07-23 | RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding | Xi Xiao et.al. | 2507.17353 | translate | read | null |
| 2025-07-22 | VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings | Ramin Giahi et.al. | 2507.17080 | translate | read | null |
| 2025-07-20 | TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning | Jie He et.al. | 2507.16844 | translate | read | null |
| 2025-07-21 | Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure | Alexandra Junell et.al. | 2507.16088 | translate | read | null |
| 2025-07-21 | MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations | Deyun Zhang et.al. | 2507.15255 | translate | read | null |
| 2025-07-20 | LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering | Xinxin Dong et.al. | 2507.14784 | translate | read | null |
| 2025-07-18 | MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training | Yuechen Xie et.al. | 2507.13673 | translate | read | null |
| 2025-07-17 | City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning | Penglei Sun et.al. | 2507.12795 | translate | read | null |
| 2025-07-17 | A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models | Weijieying Ren et.al. | 2507.12774 | translate | read | null |
| 2025-07-15 | Partitioner Guided Modal Learning Framework | Guimin Hu et.al. | 2507.11661 | translate | read | null |
| 2025-07-15 | A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition | Xinkui Zhao et.al. | 2507.11202 | translate | read | null |
| 2025-07-14 | Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language | Andrew C. Li et.al. | 2507.10741 | translate | read | null |
| 2025-07-14 | Boosting Multimodal Learning via Disentangled Gradient Learning | Shicai Wei et.al. | 2507.10213 | translate | read | null |
| 2025-07-21 | Improving Multimodal Learning via Imbalanced Learning | Shicai Wei et.al. | 2507.10203 | translate | read | link |
| 2025-07-13 | HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space | Changli Wang et.al. | 2507.09487 | translate | read | null |
| 2025-07-09 | Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors | Chunzhuo Wang et.al. | 2507.07261 | translate | read | null |
| 2025-07-09 | Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey | Getamesay Haile Dagnaw et.al. | 2507.07148 | translate | read | null |
| 2025-07-08 | Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration | Maximilian Tschuchnig et.al. | 2507.06067 | translate | read | null |
| 2025-07-08 | Graph Learning | Feng Xia et.al. | 2507.05636 | translate | read | null |
| 2025-07-07 | Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models | Eunseop Yoon et.al. | 2507.04976 | translate | read | null |
| 2025-07-07 | From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach | Mihai Masala et.al. | 2507.04815 | translate | read | null |
| 2025-07-07 | MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding | Zhicheng Zhang et.al. | 2507.04635 | translate | read | null |
| 2025-07-10 | DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth | Zheng Lian et.al. | 2507.04278 | translate | read | null |
| 2025-07-05 | Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation | Fernando Gabriela Garcia et.al. | 2507.04151 | translate | read | null |
| 2025-07-03 | Intelligent Histology for Tumor Neurosurgery | Xinhai Hou et.al. | 2507.03037 | translate | read | null |
| 2025-07-01 | Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers | Yusuf Shihata et.al. | 2507.02985 | translate | read | null |
| 2025-07-02 | TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation | Yubeen Lee et.al. | 2507.02080 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)