Multimodal - 2025-08 | Paper Arxiv Daily

Multimodal - 2025-08

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-08-29	Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer	Daniël Boeke et.al.	2508.21581	translate	read	null
2025-08-27	Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement	Mohammed Rakibul Hasan et.al.	2508.19887	translate	read	null
2025-08-27	AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning	Shu Shen et.al.	2508.19769	translate	read	null
2025-08-25	BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration	Jun Hou et.al.	2508.18551	translate	read	null
2025-08-22	Can VLMs Recall Factual Associations From Visual References?	Dhananjay Ashok et.al.	2508.18297	translate	read	null
2025-08-20	Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders	Yiming Tang et.al.	2508.18236	translate	read	null
2025-08-24	Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice	Hugo Bohy et.al.	2508.17502	translate	read	link
2025-08-24	Multimodal Representation Learning Conditioned on Semantic Relations	Yang Qiao et.al.	2508.17497	translate	read	null
2025-08-24	SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality	Yuzhi Lai et.al.	2508.17255	translate	read	null
2025-08-10	An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance	Hsuan-Kung Yang et.al.	2508.16602	translate	read	null
2025-08-22	Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization	Yupei Zhang et.al.	2508.16479	translate	read	null
2025-08-22	A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension	Mohammad Zia Ur Rehman et.al.	2508.16300	translate	read	null
2025-08-21	Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation	Huy Hoang Nguyen et.al.	2508.15427	translate	read	null
2025-08-21	DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding	Zhu Wang et.al.	2508.15297	translate	read	null
2025-08-20	MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs	Ruyi Ding et.al.	2508.15036	translate	read	null
2025-08-19	Beyond Simple Edits: Composed Video Retrieval with Dense Modifications	Omkar Thawakar et.al.	2508.14039	translate	read	link
2025-08-19	CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter	Junyeong Park et.al.	2508.13530	translate	read	null
2025-08-19	CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models	Catherine Glossop et.al.	2508.13446	translate	read	null
2025-08-18	SPANER: Shared Prompt Aligner for Multimodal Semantic Representation	Thye Shan Ng et.al.	2508.13387	translate	read	null
2025-08-18	Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation	Tanjim Islam Riju et.al.	2508.13068	translate	read	null
2025-08-17	Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping	Xuhui Zhan et.al.	2508.12466	translate	read	link
2025-08-16	MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization	Haochen You et.al.	2508.12149	translate	read	null
2025-08-16	ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models	Zhichen Lou et.al.	2508.11918	translate	read	null
2025-08-13	MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning	Thanh-Dat Truong et.al.	2508.10133	translate	read	null
2025-08-13	Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model	Sushrut Patwardhan et.al.	2508.10110	translate	read	null
2025-08-12	LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition	Zhining He et.al.	2508.08925	translate	read	null
2025-08-12	Multimodal learning enables instant ionizing radiation alerts on unmodified mobile phones for real-world emergency response	Yanfeng Xie et.al.	2508.08541	translate	read	null
2025-08-11	BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models	Maozhen Zhang et.al.	2508.08040	translate	read	null
2025-08-11	A Trustworthy Method for Multimodal Emotion Recognition	Junxiao Xue et.al.	2508.07625	translate	read	null
2025-08-10	Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding	Zhaoyu Chen et.al.	2508.07388	translate	read	null
2025-08-10	FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning	Van Duc Cuong et.al.	2508.07264	translate	read	null
2025-08-09	Can Multitask Learning Enhance Model Explainability?	Hiba Najjar et.al.	2508.06966	translate	read	null
2025-08-09	Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction	Hiba Najjar et.al.	2508.06939	translate	read	null
2025-08-09	Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities	Rui Liu et.al.	2508.06800	translate	read	null
2025-08-08	Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records	Mosbah Aouad et.al.	2508.06627	translate	read	null
2025-08-07	Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features	Manish Kansana et.al.	2508.06566	translate	read	null
2025-08-06	Grounding Emotion Recognition with Visual Prototypes: VEGA – Revisiting CLIP in MERC	Guanyu Hu et.al.	2508.06564	translate	read	null
2025-08-08	Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning	Xiangyu Wu et.al.	2508.06382	translate	read	null
2025-08-08	ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge	Juewen Hu et.al.	2508.05991	translate	read	null
2025-08-07	Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning	Luai Abuelsamen et.al.	2508.05077	translate	read	null
2025-08-07	MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding	Weifan Zhang et.al.	2508.05021	translate	read	null
2025-08-06	Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models	Md Raisul Kibria et.al.	2508.04427	translate	read	null
2025-08-06	Length Matters: Length-Aware Transformer for Temporal Sentence Grounding	Yifan Wang et.al.	2508.04299	translate	read	null
2025-08-06	SVC 2025: the First Multimodal Deception Detection Challenge	Xun Lin et.al.	2508.04129	translate	read	null
2025-08-05	T2UE: Generating Unlearnable Examples from Text Descriptions	Xingjun Ma et.al.	2508.03091	translate	read	null
2025-08-04	MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming	Shuo Wang et.al.	2508.02549	translate	read	null
2025-08-04	Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs	Yitong Zhu et.al.	2508.02133	translate	read	null
2025-08-04	“Harmless to You, Hurtful to Me!”: Investigating the Detection of Toxic Languages Grounded in the Perspective of Youth	Yaqiong Li et.al.	2508.02094	translate	read	null
2025-08-03	DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition	Peiyuan Jiang et.al.	2508.01644	translate	read	null
2025-08-02	A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics	Rushin H. Gindra et.al.	2508.01490	translate	read	null
2025-08-02	AffectGPT-R1: Leveraging Reinforcement Learning for Open-Vocabulary Emotion Recognition	Zheng Lian et.al.	2508.01318	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)