Multimodal - 2025-06
Multimodal - 2025-06
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-06-27 | XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science | Jithendaraa Subramanian et.al. | 2507.01054 | translate | read | null |
| 2025-06-27 | Test-Time Consistency in Vision Language Models | Shih-Han Chou et.al. | 2506.22395 | translate | read | null |
| 2025-06-27 | Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems | Abdulmomen Ghalkha et.al. | 2506.22374 | translate | read | null |
| 2025-06-26 | ImplicitQA: Going beyond frames towards Implicit Video Reasoning | Sirnam Swetha et.al. | 2506.21742 | translate | read | link |
| 2025-06-28 | G $^{2}$ D: Boosting Multimodal Learning with Gradient-Guided Distillation | Mohammed Rakib et.al. | 2506.21514 | translate | read | null |
| 2025-06-26 | V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling | Junwei You et.al. | 2506.21041 | translate | read | null |
| 2025-06-26 | TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence | Feng Jiang et.al. | 2506.21028 | translate | read | null |
| 2025-06-26 | Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024) | Shihui Feng et.al. | 2506.20971 | translate | read | null |
| 2025-06-24 | Emergence of Text Readability in Vision Language Models | Jaeyoo Park et.al. | 2506.19389 | translate | read | null |
| 2025-06-27 | Haptic-ACT – Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers | Pedro Miguel Uriguen Eljuri et.al. | 2506.18212 | translate | read | null |
| 2025-06-21 | Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning? | Yuesheng Huang et.al. | 2506.17623 | translate | read | null |
| 2025-06-24 | AI-based Multimodal Biometrics for Detecting Smartphone Distractions: Application to Online Learning | Alvaro Becerra et.al. | 2506.17364 | translate | read | null |
| 2025-06-20 | With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You | Fabian Gröger et.al. | 2506.16895 | translate | read | null |
| 2025-06-18 | A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion | Fangzhou Lin et.al. | 2506.15747 | translate | read | null |
| 2025-06-18 | Foundation of Affective Computing and Interaction | Changzeng Fu et.al. | 2506.15497 | translate | read | null |
| 2025-06-18 | video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models | Changli Tang et.al. | 2506.15220 | translate | read | link |
| 2025-06-17 | Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation? | Nitesh Subedi et.al. | 2506.14507 | translate | read | link |
| 2025-06-16 | Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography | Yusdivia Molina-Román et.al. | 2506.13964 | translate | read | null |
| 2025-06-16 | A Survey on World Models Grounded in Acoustic Physical Information | Xiaoliang Chen et.al. | 2506.13833 | translate | read | link |
| 2025-06-16 | A Survey on Imitation Learning for Contact-Rich Tasks in Robotics | Toshiaki Tsuji et.al. | 2506.13498 | translate | read | null |
| 2025-06-16 | Fatigue-Aware Adaptive Interfaces for Wearable Devices Using Deep Learning | Yikan Wang et.al. | 2506.13203 | translate | read | null |
| 2025-06-15 | Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models | Liam Bennett et.al. | 2506.12733 | translate | read | null |
| 2025-06-14 | Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics | Asifullah khan et.al. | 2506.12365 | translate | read | null |
| 2025-06-14 | GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition | Yuntao Shou et.al. | 2506.12325 | translate | read | null |
| 2025-06-16 | Improving Multimodal Learning Balance and Sufficiency through Data Remixing | Xiaoyu Ma et.al. | 2506.11550 | translate | read | link |
| 2025-06-13 | RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer | Haotian Ni et.al. | 2506.11465 | translate | read | null |
| 2025-06-12 | Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education | Conrad Borchers et.al. | 2506.11326 | translate | read | null |
| 2025-06-12 | Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction | Thanathai Lertpetchpun et.al. | 2506.10930 | translate | read | null |
| 2025-06-12 | Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts | Guowei Zhong et.al. | 2506.10452 | translate | read | link |
| 2025-06-09 | Segment Any Architectural Facades (SAAF):An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance | Peilin Li et.al. | 2506.09071 | translate | read | null |
| 2025-06-10 | Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment | Maximilian Tschuchnig et.al. | 2506.08716 | translate | read | null |
| 2025-06-10 | MOSAIC-F: A Framework for Enhancing Students’ Oral Presentation Skills through Personalized Feedback | Alvaro Becerra et.al. | 2506.08634 | translate | read | null |
| 2025-06-09 | Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs | Jared Strader et.al. | 2506.07454 | translate | read | null |
| 2025-06-08 | A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning | Jiachen Zhong et.al. | 2506.07236 | translate | read | null |
| 2025-06-08 | Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Tianyi Bai et.al. | 2506.07227 | translate | read | null |
| 2025-06-08 | A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge | Tarique Dahri et.al. | 2506.07055 | translate | read | null |
| 2025-06-06 | Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning | Sheng Chen et.al. | 2506.06205 | translate | read | null |
| 2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | translate | read | null |
| 2025-06-06 | MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory | Ana Carolina Condez et.al. | 2506.05696 | translate | read | null |
| 2025-06-03 | Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation | Israa A. Albadarneh et.al. | 2506.05399 | translate | read | null |
| 2025-06-05 | Towards Language-Augmented Multi-Agent Deep Reinforcement Learning | Maxime Toquebiau et.al. | 2506.05236 | translate | read | null |
| 2025-06-05 | Quantifying Cross-Modality Memorization in Vision-Language Models | Yuxin Wen et.al. | 2506.05198 | translate | read | null |
| 2025-06-05 | A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions | Anh Le et.al. | 2506.05061 | translate | read | null |
| 2025-06-04 | EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation | Cheng Zhang et.al. | 2506.03652 | translate | read | null |
| 2025-06-03 | Enriching Location Representation with Detailed Semantic Information | Junyuan Liu et.al. | 2506.02744 | translate | read | null |
| 2025-06-02 | Entity Image and Mixed-Modal Image Retrieval Datasets | Cristian-Ioan Blaga et.al. | 2506.02291 | translate | read | null |
| 2025-06-02 | Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities | Yanxi Luo et.al. | 2506.01490 | translate | read | null |
| 2025-06-02 | Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark | Shuyu Yang et.al. | 2506.01466 | translate | read | null |
| 2025-06-02 | Agentic Episodic Control | Xidong Yang et.al. | 2506.01442 | translate | read | null |
| 2025-06-01 | Leveraging CLIP Encoder for Multimodal Emotion Recognition | Yehun Song et.al. | 2506.00903 | translate | read | null |
| 2025-06-01 | GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints | Jiajun He et.al. | 2506.00865 | translate | read | null |
| 2025-06-01 | TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning | Jiaqi Luo et.al. | 2506.00813 | translate | read | null |
| 2025-06-02 | Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles | Zifu Wang et.al. | 2505.23590 | translate | read | link |
(<a href=../Multimodal.md>back to Multimodal</a>)