Multimodal - 2025-06

Publish Date Title Authors PDF Translate Read Code
2025-06-27 XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science Jithendaraa Subramanian et.al. 2507.01054 translate read null
2025-06-27 Test-Time Consistency in Vision Language Models Shih-Han Chou et.al. 2506.22395 translate read null
2025-06-27 Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems Abdulmomen Ghalkha et.al. 2506.22374 translate read null
2025-06-26 ImplicitQA: Going beyond frames towards Implicit Video Reasoning Sirnam Swetha et.al. 2506.21742 translate read link
2025-06-28 G $^{2}$ D: Boosting Multimodal Learning with Gradient-Guided Distillation Mohammed Rakib et.al. 2506.21514 translate read null
2025-06-26 V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling Junwei You et.al. 2506.21041 translate read null
2025-06-26 TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence Feng Jiang et.al. 2506.21028 translate read null
2025-06-26 Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024) Shihui Feng et.al. 2506.20971 translate read null
2025-06-24 Emergence of Text Readability in Vision Language Models Jaeyoo Park et.al. 2506.19389 translate read null
2025-06-27 Haptic-ACT – Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers Pedro Miguel Uriguen Eljuri et.al. 2506.18212 translate read null
2025-06-21 Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning? Yuesheng Huang et.al. 2506.17623 translate read null
2025-06-24 AI-based Multimodal Biometrics for Detecting Smartphone Distractions: Application to Online Learning Alvaro Becerra et.al. 2506.17364 translate read null
2025-06-20 With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You Fabian Gröger et.al. 2506.16895 translate read null
2025-06-18 A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion Fangzhou Lin et.al. 2506.15747 translate read null
2025-06-18 Foundation of Affective Computing and Interaction Changzeng Fu et.al. 2506.15497 translate read null
2025-06-18 video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Changli Tang et.al. 2506.15220 translate read link
2025-06-17 Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation? Nitesh Subedi et.al. 2506.14507 translate read link
2025-06-16 Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography Yusdivia Molina-Román et.al. 2506.13964 translate read null
2025-06-16 A Survey on World Models Grounded in Acoustic Physical Information Xiaoliang Chen et.al. 2506.13833 translate read link
2025-06-16 A Survey on Imitation Learning for Contact-Rich Tasks in Robotics Toshiaki Tsuji et.al. 2506.13498 translate read null
2025-06-16 Fatigue-Aware Adaptive Interfaces for Wearable Devices Using Deep Learning Yikan Wang et.al. 2506.13203 translate read null
2025-06-15 Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models Liam Bennett et.al. 2506.12733 translate read null
2025-06-14 Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics Asifullah khan et.al. 2506.12365 translate read null
2025-06-14 GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition Yuntao Shou et.al. 2506.12325 translate read null
2025-06-16 Improving Multimodal Learning Balance and Sufficiency through Data Remixing Xiaoyu Ma et.al. 2506.11550 translate read link
2025-06-13 RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer Haotian Ni et.al. 2506.11465 translate read null
2025-06-12 Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education Conrad Borchers et.al. 2506.11326 translate read null
2025-06-12 Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction Thanathai Lertpetchpun et.al. 2506.10930 translate read null
2025-06-12 Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts Guowei Zhong et.al. 2506.10452 translate read link
2025-06-09 Segment Any Architectural Facades (SAAF):An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance Peilin Li et.al. 2506.09071 translate read null
2025-06-10 Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment Maximilian Tschuchnig et.al. 2506.08716 translate read null
2025-06-10 MOSAIC-F: A Framework for Enhancing Students’ Oral Presentation Skills through Personalized Feedback Alvaro Becerra et.al. 2506.08634 translate read null
2025-06-09 Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs Jared Strader et.al. 2506.07454 translate read null
2025-06-08 A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning Jiachen Zhong et.al. 2506.07236 translate read null
2025-06-08 Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Tianyi Bai et.al. 2506.07227 translate read null
2025-06-08 A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge Tarique Dahri et.al. 2506.07055 translate read null
2025-06-06 Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Sheng Chen et.al. 2506.06205 translate read null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 translate read null
2025-06-06 MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory Ana Carolina Condez et.al. 2506.05696 translate read null
2025-06-03 Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation Israa A. Albadarneh et.al. 2506.05399 translate read null
2025-06-05 Towards Language-Augmented Multi-Agent Deep Reinforcement Learning Maxime Toquebiau et.al. 2506.05236 translate read null
2025-06-05 Quantifying Cross-Modality Memorization in Vision-Language Models Yuxin Wen et.al. 2506.05198 translate read null
2025-06-05 A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions Anh Le et.al. 2506.05061 translate read null
2025-06-04 EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation Cheng Zhang et.al. 2506.03652 translate read null
2025-06-03 Enriching Location Representation with Detailed Semantic Information Junyuan Liu et.al. 2506.02744 translate read null
2025-06-02 Entity Image and Mixed-Modal Image Retrieval Datasets Cristian-Ioan Blaga et.al. 2506.02291 translate read null
2025-06-02 Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities Yanxi Luo et.al. 2506.01490 translate read null
2025-06-02 Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark Shuyu Yang et.al. 2506.01466 translate read null
2025-06-02 Agentic Episodic Control Xidong Yang et.al. 2506.01442 translate read null
2025-06-01 Leveraging CLIP Encoder for Multimodal Emotion Recognition Yehun Song et.al. 2506.00903 translate read null
2025-06-01 GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints Jiajun He et.al. 2506.00865 translate read null
2025-06-01 TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning Jiaqi Luo et.al. 2506.00813 translate read null
2025-06-02 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles Zifu Wang et.al. 2505.23590 translate read link

(<a href=../Multimodal.md>back to Multimodal</a>)