Multimodal - 2025-08
Multimodal - 2025-08
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-08-29 | Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer | Daniël Boeke et.al. | 2508.21581 | translate | read | null |
| 2025-08-27 | Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement | Mohammed Rakibul Hasan et.al. | 2508.19887 | translate | read | null |
| 2025-08-27 | AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning | Shu Shen et.al. | 2508.19769 | translate | read | null |
| 2025-08-25 | BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration | Jun Hou et.al. | 2508.18551 | translate | read | null |
| 2025-08-22 | Can VLMs Recall Factual Associations From Visual References? | Dhananjay Ashok et.al. | 2508.18297 | translate | read | null |
| 2025-08-20 | Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders | Yiming Tang et.al. | 2508.18236 | translate | read | null |
| 2025-08-24 | Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice | Hugo Bohy et.al. | 2508.17502 | translate | read | link |
| 2025-08-24 | Multimodal Representation Learning Conditioned on Semantic Relations | Yang Qiao et.al. | 2508.17497 | translate | read | null |
| 2025-08-24 | SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality | Yuzhi Lai et.al. | 2508.17255 | translate | read | null |
| 2025-08-10 | An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance | Hsuan-Kung Yang et.al. | 2508.16602 | translate | read | null |
| 2025-08-22 | Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization | Yupei Zhang et.al. | 2508.16479 | translate | read | null |
| 2025-08-22 | A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension | Mohammad Zia Ur Rehman et.al. | 2508.16300 | translate | read | null |
| 2025-08-21 | Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation | Huy Hoang Nguyen et.al. | 2508.15427 | translate | read | null |
| 2025-08-21 | DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding | Zhu Wang et.al. | 2508.15297 | translate | read | null |
| 2025-08-20 | MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs | Ruyi Ding et.al. | 2508.15036 | translate | read | null |
| 2025-08-19 | Beyond Simple Edits: Composed Video Retrieval with Dense Modifications | Omkar Thawakar et.al. | 2508.14039 | translate | read | link |
| 2025-08-19 | CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter | Junyeong Park et.al. | 2508.13530 | translate | read | null |
| 2025-08-19 | CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models | Catherine Glossop et.al. | 2508.13446 | translate | read | null |
| 2025-08-18 | SPANER: Shared Prompt Aligner for Multimodal Semantic Representation | Thye Shan Ng et.al. | 2508.13387 | translate | read | null |
| 2025-08-18 | Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation | Tanjim Islam Riju et.al. | 2508.13068 | translate | read | null |
| 2025-08-17 | Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping | Xuhui Zhan et.al. | 2508.12466 | translate | read | link |
| 2025-08-16 | MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization | Haochen You et.al. | 2508.12149 | translate | read | null |
| 2025-08-16 | ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models | Zhichen Lou et.al. | 2508.11918 | translate | read | null |
| 2025-08-13 | MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning | Thanh-Dat Truong et.al. | 2508.10133 | translate | read | null |
| 2025-08-13 | Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model | Sushrut Patwardhan et.al. | 2508.10110 | translate | read | null |
| 2025-08-12 | LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition | Zhining He et.al. | 2508.08925 | translate | read | null |
| 2025-08-12 | Multimodal learning enables instant ionizing radiation alerts on unmodified mobile phones for real-world emergency response | Yanfeng Xie et.al. | 2508.08541 | translate | read | null |
| 2025-08-11 | BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models | Maozhen Zhang et.al. | 2508.08040 | translate | read | null |
| 2025-08-11 | A Trustworthy Method for Multimodal Emotion Recognition | Junxiao Xue et.al. | 2508.07625 | translate | read | null |
| 2025-08-10 | Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding | Zhaoyu Chen et.al. | 2508.07388 | translate | read | null |
| 2025-08-10 | FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning | Van Duc Cuong et.al. | 2508.07264 | translate | read | null |
| 2025-08-09 | Can Multitask Learning Enhance Model Explainability? | Hiba Najjar et.al. | 2508.06966 | translate | read | null |
| 2025-08-09 | Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction | Hiba Najjar et.al. | 2508.06939 | translate | read | null |
| 2025-08-09 | Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities | Rui Liu et.al. | 2508.06800 | translate | read | null |
| 2025-08-08 | Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records | Mosbah Aouad et.al. | 2508.06627 | translate | read | null |
| 2025-08-07 | Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features | Manish Kansana et.al. | 2508.06566 | translate | read | null |
| 2025-08-06 | Grounding Emotion Recognition with Visual Prototypes: VEGA – Revisiting CLIP in MERC | Guanyu Hu et.al. | 2508.06564 | translate | read | null |
| 2025-08-08 | Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning | Xiangyu Wu et.al. | 2508.06382 | translate | read | null |
| 2025-08-08 | ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge | Juewen Hu et.al. | 2508.05991 | translate | read | null |
| 2025-08-07 | Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning | Luai Abuelsamen et.al. | 2508.05077 | translate | read | null |
| 2025-08-07 | MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding | Weifan Zhang et.al. | 2508.05021 | translate | read | null |
| 2025-08-06 | Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models | Md Raisul Kibria et.al. | 2508.04427 | translate | read | null |
| 2025-08-06 | Length Matters: Length-Aware Transformer for Temporal Sentence Grounding | Yifan Wang et.al. | 2508.04299 | translate | read | null |
| 2025-08-06 | SVC 2025: the First Multimodal Deception Detection Challenge | Xun Lin et.al. | 2508.04129 | translate | read | null |
| 2025-08-05 | T2UE: Generating Unlearnable Examples from Text Descriptions | Xingjun Ma et.al. | 2508.03091 | translate | read | null |
| 2025-08-04 | MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming | Shuo Wang et.al. | 2508.02549 | translate | read | null |
| 2025-08-04 | Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs | Yitong Zhu et.al. | 2508.02133 | translate | read | null |
| 2025-08-04 | “Harmless to You, Hurtful to Me!”: Investigating the Detection of Toxic Languages Grounded in the Perspective of Youth | Yaqiong Li et.al. | 2508.02094 | translate | read | null |
| 2025-08-03 | DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition | Peiyuan Jiang et.al. | 2508.01644 | translate | read | null |
| 2025-08-02 | A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics | Rushin H. Gindra et.al. | 2508.01490 | translate | read | null |
| 2025-08-02 | AffectGPT-R1: Leveraging Reinforcement Learning for Open-Vocabulary Emotion Recognition | Zheng Lian et.al. | 2508.01318 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)