Multimodal - 2025-07 | Paper Arxiv Daily

Multimodal - 2025-07

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-07-29	Multimodal Video Emotion Recognition with Reliable Reasoning Priors	Zhepeng Wang et.al.	2508.03722	translate	read	null
2025-07-29	SmartCLIP: Modular Vision-language Alignment with Identification Guarantees	Shaoan Xie et.al.	2507.22264	translate	read	null
2025-07-29	MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces	Shaojun E et.al.	2507.21741	translate	read	link
2025-07-29	Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion	Zeyu Deng et.al.	2507.21395	translate	read	null
2025-07-28	On the Limits of Hierarchically Embedded Logic in Classical Neural Networks	Bill Cochran et.al.	2507.20960	translate	read	null
2025-07-28	TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model	Ao Li et.al.	2507.20630	translate	read	null
2025-07-25	Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization	Hsuan-Yu Wang et.al.	2507.19356	translate	read	null
2025-07-25	SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality	Sijie Li et.al.	2507.19264	translate	read	null
2025-07-24	Deep Learning for Blood-Brain Barrier Permeability Prediction	Zihan Yang et.al.	2507.18557	translate	read	null
2025-07-23	RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding	Xi Xiao et.al.	2507.17353	translate	read	null
2025-07-22	VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings	Ramin Giahi et.al.	2507.17080	translate	read	null
2025-07-20	TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning	Jie He et.al.	2507.16844	translate	read	null
2025-07-21	Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure	Alexandra Junell et.al.	2507.16088	translate	read	null
2025-07-21	MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations	Deyun Zhang et.al.	2507.15255	translate	read	null
2025-07-20	LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering	Xinxin Dong et.al.	2507.14784	translate	read	null
2025-07-18	MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training	Yuechen Xie et.al.	2507.13673	translate	read	null
2025-07-17	City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning	Penglei Sun et.al.	2507.12795	translate	read	null
2025-07-17	A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models	Weijieying Ren et.al.	2507.12774	translate	read	null
2025-07-15	Partitioner Guided Modal Learning Framework	Guimin Hu et.al.	2507.11661	translate	read	null
2025-07-15	A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition	Xinkui Zhao et.al.	2507.11202	translate	read	null
2025-07-14	Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language	Andrew C. Li et.al.	2507.10741	translate	read	null
2025-07-14	Boosting Multimodal Learning via Disentangled Gradient Learning	Shicai Wei et.al.	2507.10213	translate	read	null
2025-07-21	Improving Multimodal Learning via Imbalanced Learning	Shicai Wei et.al.	2507.10203	translate	read	link
2025-07-13	HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space	Changli Wang et.al.	2507.09487	translate	read	null
2025-07-09	Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors	Chunzhuo Wang et.al.	2507.07261	translate	read	null
2025-07-09	Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey	Getamesay Haile Dagnaw et.al.	2507.07148	translate	read	null
2025-07-08	Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration	Maximilian Tschuchnig et.al.	2507.06067	translate	read	null
2025-07-08	Graph Learning	Feng Xia et.al.	2507.05636	translate	read	null
2025-07-07	Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models	Eunseop Yoon et.al.	2507.04976	translate	read	null
2025-07-07	From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach	Mihai Masala et.al.	2507.04815	translate	read	null
2025-07-07	MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding	Zhicheng Zhang et.al.	2507.04635	translate	read	null
2025-07-10	DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth	Zheng Lian et.al.	2507.04278	translate	read	null
2025-07-05	Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation	Fernando Gabriela Garcia et.al.	2507.04151	translate	read	null
2025-07-03	Intelligent Histology for Tumor Neurosurgery	Xinhai Hou et.al.	2507.03037	translate	read	null
2025-07-01	Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers	Yusuf Shihata et.al.	2507.02985	translate	read	null
2025-07-02	TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation	Yubeen Lee et.al.	2507.02080	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)