Multimodal - 2025-09
Multimodal - 2025-09
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-09-30 | MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation | Md Zubair et.al. | 2510.07328 | translate | read | null |
| 2025-09-25 | Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data | Jiancheng Zhang et.al. | 2510.03247 | translate | read | null |
| 2025-09-30 | MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning | Seong-Hyeon Hwang et.al. | 2509.25831 | translate | read | null |
| 2025-09-30 | ProbMed: A Probabilistic Framework for Medical Multimodal Binding | Yuan Gao et.al. | 2509.25711 | translate | read | null |
| 2025-09-30 | Massively Multimodal Foundation Models: A Framework for Capturing Dependencies with Specialized Mixture-of-Experts | Xing Han et.al. | 2509.25678 | translate | read | null |
| 2025-09-30 | Generalized Contrastive Learning for Universal Multimodal Retrieval | Jungsoo Lee et.al. | 2509.25638 | translate | read | null |
| 2025-09-29 | FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology | Faizan Farooq Khan et.al. | 2509.25564 | translate | read | null |
| 2025-09-29 | MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series | Payal Mohapatra et.al. | 2509.25278 | translate | read | null |
| 2025-09-29 | A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity | Giordano Cicchetti et.al. | 2509.24734 | translate | read | null |
| 2025-09-29 | Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey | Yuntao Shou et.al. | 2509.24322 | translate | read | null |
| 2025-09-28 | Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics | Luxuan Zhang et.al. | 2509.23543 | translate | read | null |
| 2025-09-26 | RefAM: Attention Magnets for Zero-Shot Referral Segmentation | Anna Kukleva et.al. | 2509.22650 | translate | read | null |
| 2025-09-26 | HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes | Katrina Ashton et.al. | 2509.22498 | translate | read | null |
| 2025-09-26 | From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment | Ke Ye et.al. | 2509.22205 | translate | read | null |
| 2025-09-26 | WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM | Changli Tang et.al. | 2509.21990 | translate | read | null |
| 2025-09-26 | VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation | Huayi Zhou et.al. | 2509.21723 | translate | read | null |
| 2025-09-25 | Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations | Zhijian Yang et.al. | 2509.21249 | translate | read | null |
| 2025-09-25 | SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization | Jiehui Luo et.al. | 2509.21033 | translate | read | null |
| 2025-09-14 | LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition | Zejun Liu et.al. | 2509.19330 | translate | read | null |
| 2025-09-10 | Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning | Yiqiao Chen et.al. | 2509.19315 | translate | read | null |
| 2025-09-23 | Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation | Christian Ganhör et.al. | 2509.18807 | translate | read | null |
| 2025-09-23 | M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition | Jiajun He et.al. | 2509.18706 | translate | read | null |
| 2025-09-22 | Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction | Yi Gu et.al. | 2509.18284 | translate | read | null |
| 2025-09-22 | ClassMind: Scaling Classroom Observation and Instructional Feedback with Multimodal AI | Ao Qu et.al. | 2509.18020 | translate | read | null |
| 2025-09-22 | M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer | Yanxin Zhang et.al. | 2509.18005 | translate | read | null |
| 2025-09-22 | Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training | Divya Mereddy et.al. | 2509.17888 | translate | read | null |
| 2025-09-22 | MLLM-Driven Semantic Identifier Generation for Generative Cross-Modal Retrieval | Tianyuan Li et.al. | 2509.17359 | translate | read | null |
| 2025-09-20 | Self-organized epithelial reticulum inhibits cell proliferation | Liav Daraf et.al. | 2509.16661 | translate | read | null |
| 2025-09-19 | Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation | Weimin Bai et.al. | 2509.15772 | translate | read | null |
| 2025-09-19 | Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion | Shanghong Li et.al. | 2509.15578 | translate | read | null |
| 2025-09-19 | Beyond Words: Enhancing Desire, Emotion, and Sentiment Recognition with Non-Verbal Cues | Wei Chen et.al. | 2509.15540 | translate | read | null |
| 2025-09-17 | Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays | Hanbin Ko et.al. | 2509.15234 | translate | read | null |
| 2025-09-17 | VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI | Daiqi Liu et.al. | 2509.13767 | translate | read | null |
| 2025-09-15 | Evaluating Robustness of Vision-Language Models Under Noisy Conditions | Purushoth et.al. | 2509.12492 | translate | read | null |
| 2025-09-15 | OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling | Yang Zhou et.al. | 2509.12201 | translate | read | link |
| 2025-09-15 | Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI | Bo Cao et.al. | 2509.11924 | translate | read | null |
| 2025-09-14 | GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration | Wan Xu et.al. | 2509.11360 | translate | read | null |
| 2025-09-14 | DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations | Doan Minh Trung et.al. | 2509.11187 | translate | read | null |
| 2025-09-14 | Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation | Nhi Kieu et.al. | 2509.11102 | translate | read | null |
| 2025-09-13 | Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction | Yi Lu et.al. | 2509.10802 | translate | read | null |
| 2025-09-11 | Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training | Anthony P. Addison et.al. | 2509.09290 | translate | read | null |
| 2025-09-09 | Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review | Alvaro Becerra et.al. | 2509.07742 | translate | read | null |
| 2025-09-08 | Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding | Jiangnan Xie et.al. | 2509.06291 | translate | read | null |
| 2025-09-06 | GraMFedDHAR: Graph Based Multimodal Differentially Private Federated HAR | Labani Halder et.al. | 2509.05671 | translate | read | null |
| 2025-09-06 | Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities | Xiaoguang Zhu et.al. | 2509.05615 | translate | read | null |
| 2025-09-04 | Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models | Kimia Ehsani et.al. | 2509.03837 | translate | read | null |
| 2025-09-03 | Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support | Eduardo Davalos et.al. | 2509.03741 | translate | read | null |
| 2025-09-03 | Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning | Duy A. Nguyen et.al. | 2509.03477 | translate | read | null |
| 2025-09-03 | Multimodal learning of melt pool dynamics in laser powder bed fusion | Satyajit Mojumder et.al. | 2509.03029 | translate | read | null |
| 2025-09-03 | Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability | Shuai Jiang et.al. | 2509.02962 | translate | read | null |
| 2025-09-02 | Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception | Changshi Zhou et.al. | 2509.02324 | translate | read | null |
| 2025-09-02 | Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective | Shijie Wang et.al. | 2509.02281 | translate | read | null |
| 2025-09-02 | Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic | Nirmalya Thakur et.al. | 2509.01954 | translate | read | null |
| 2025-09-01 | OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning | Yanqing Liu et.al. | 2509.01644 | translate | read | link |
| 2025-09-01 | Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement | Jiayi Gao et.al. | 2509.01362 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)