Multimodal - 2025-10
Multimodal - 2025-10
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-10-24 | Multimodal Detection of Fake Reviews using BERT and ResNet-50 | Suhasnadh Reddy Veluru et.al. | 2511.00020 | translate | read | null |
| 2025-10-04 | Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment | Adrian-Dinu Urse et.al. | 2511.00004 | translate | read | null |
| 2025-10-31 | MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data | Yu-Chen Kuo et.al. | 2510.27321 | translate | read | null |
| 2025-10-30 | Evaluating Perspectival Biases in Cross-Modal Retrieval | Teerapol Saengsukhiran et.al. | 2510.26861 | translate | read | null |
| 2025-10-30 | Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise | Zijing Xu et.al. | 2510.26289 | translate | read | null |
| 2025-10-29 | Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start | Kun Chen et.al. | 2510.25801 | translate | read | null |
| 2025-10-29 | LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation | Yang Miao et.al. | 2510.25263 | translate | read | null |
| 2025-10-29 | H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts | Peilin Tan et.al. | 2510.25091 | translate | read | null |
| 2025-10-29 | Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments | Manjunath Prasad Holenarasipura Rajiv et.al. | 2510.25070 | translate | read | null |
| 2025-10-28 | Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning | Hossein R. Nowdeh et.al. | 2510.24919 | translate | read | null |
| 2025-10-28 | MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition | Haoyang Zhang et.al. | 2510.24827 | translate | read | null |
| 2025-10-24 | Towards Fine-Grained Human Motion Video Captioning | Guorui Song et.al. | 2510.24767 | translate | read | null |
| 2025-10-27 | Toward Clinically Grounded Foundation Models in Pathology | Hamid R. Tizhoosh et.al. | 2510.23807 | translate | read | null |
| 2025-10-27 | Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier | Hyeongseop Rha et.al. | 2510.23506 | translate | read | null |
| 2025-10-27 | Evaluation of Vision-LLMs in Surveillance Video | Pascal Benschop et.al. | 2510.23190 | translate | read | null |
| 2025-10-21 | Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation | Chanyoung Chung et.al. | 2510.21812 | translate | read | null |
| 2025-10-07 | Avi: Action from Volumetric Inference | Harris Song et.al. | 2510.21746 | translate | read | null |
| 2025-10-24 | CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis | Yiming Tang et.al. | 2510.21464 | translate | read | null |
| 2025-10-24 | Bridging the gap to real-world language-grounded visual concept learning | Whie Jung et.al. | 2510.21412 | translate | read | null |
| 2025-10-23 | Multimodal Negative Learning | Baoquan Gong et.al. | 2510.20877 | translate | read | null |
| 2025-10-23 | Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process | Tsai Hor Chan et.al. | 2510.20736 | translate | read | null |
| 2025-10-23 | Calibrating Multimodal Consensus for Emotion Recognition | Guowei Zhong et.al. | 2510.20256 | translate | read | null |
| 2025-10-22 | Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment | Yuhang Liu et.al. | 2510.19384 | translate | read | null |
| 2025-10-22 | FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation | Chirag Padubidri et.al. | 2510.19305 | translate | read | null |
| 2025-10-21 | Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents | Yiqi Lin et.al. | 2510.18703 | translate | read | null |
| 2025-10-21 | Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation | Yasser Hamidullah et.al. | 2510.18439 | translate | read | null |
| 2025-10-20 | Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware | Stavros Mitsis et.al. | 2510.18036 | translate | read | null |
| 2025-10-20 | MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning | Alejandro Guerra-Manzanares et.al. | 2510.17394 | translate | read | null |
| 2025-10-19 | Graph4MM: Weaving Multimodal Learning with Structural Information | Xuying Ning et.al. | 2510.16990 | translate | read | null |
| 2025-10-19 | ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning | Yingxu Wang et.al. | 2510.16824 | translate | read | null |
| 2025-10-19 | Pursuing Minimal Sufficiency in Spatial Reasoning | Yejie Guo et.al. | 2510.16688 | translate | read | null |
| 2025-10-18 | Safire: Similarity Framework for Visualization Retrieval | Huyen N. Nguyen et.al. | 2510.16662 | translate | read | null |
| 2025-10-18 | Structured Interfaces for Automated Reasoning with 3D Scene Graphs | Aaron Ray et.al. | 2510.16643 | translate | read | null |
| 2025-10-09 | Lyapunov-Stable Adaptive Control for Multimodal Concept Drift | Tianyu Bell Pan et.al. | 2510.15944 | translate | read | null |
| 2025-10-17 | Towards Relaxed Multimodal Inputs for Gait-based Parkinson’s Disease Assessment | Minlin Zeng et.al. | 2510.15748 | translate | read | null |
| 2025-10-17 | ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings | Prithwish Jana et.al. | 2510.15681 | translate | read | null |
| 2025-10-16 | ChangingGrounding: 3D Visual Grounding in Changing Scenes | Miao Hu et.al. | 2510.14965 | translate | read | null |
| 2025-10-16 | From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance | Zhe Li et.al. | 2510.14952 | translate | read | null |
| 2025-10-16 | Revisit Modality Imbalance at the Decision Layer | Xiaoyu Ma et.al. | 2510.14411 | translate | read | null |
| 2025-10-15 | A Multimodal Approach to Heritage Preservation in the Context of Climate Change | David Roqui et.al. | 2510.14136 | translate | read | null |
| 2025-10-15 | NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching | Run Luo et.al. | 2510.13721 | translate | read | null |
| 2025-10-15 | Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation | Jiamin Chen et.al. | 2510.13191 | translate | read | null |
| 2025-10-15 | Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning | Rongrong Xie et.al. | 2510.13182 | translate | read | null |
| 2025-10-15 | OS-HGAdapter: Open Semantic Hypergraph Adapter for Large Language Models Assisted Entropy-Enhanced Image-Text Alignment | Rongjun Chen et.al. | 2510.13131 | translate | read | null |
| 2025-10-14 | A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation | Shurong Chai et.al. | 2510.12482 | translate | read | null |
| 2025-10-14 | Ground Stratification for a Logic of Definitions with Induction | Nathan Guermond et.al. | 2510.12297 | translate | read | null |
| 2025-10-14 | IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation | Wenxu Zhou et.al. | 2510.12095 | translate | read | null |
| 2025-10-13 | Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis | Blessing Agyei Kyem et.al. | 2510.11907 | translate | read | null |
| 2025-10-11 | Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model | Chung-Soo Ahn et.al. | 2510.10078 | translate | read | null |
| 2025-10-10 | Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition | Huimin Liu et.al. | 2510.09203 | translate | read | null |
| 2025-10-09 | Provably Robust Adaptation for Language-Empowered Foundation Models | Yuni Lai et.al. | 2510.08659 | translate | read | null |
| 2025-10-07 | Centering Emotion Hotspots: Multimodal Local-Global Fusion and Cross-Modal Alignment for Emotion Recognition in Conversations | Yu Liu et.al. | 2510.08606 | translate | read | null |
| 2025-10-09 | Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling | Bianca-Mihaela Ganescu et.al. | 2510.08470 | translate | read | null |
| 2025-10-08 | FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams | Corban Rivera et.al. | 2510.07417 | translate | read | null |
| 2025-10-08 | TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation | Jiaben Chen et.al. | 2510.07249 | translate | read | null |
| 2025-10-08 | Expressive and Scalable Quantum Fusion for Multimodal Learning | Tuyen Nguyen et.al. | 2510.06938 | translate | read | null |
| 2025-10-08 | M3Retrieve: Benchmarking Multimodal Retrieval for Medicine | Arkadeep Acharya et.al. | 2510.06888 | translate | read | null |
| 2025-10-07 | Deforming Videos to Masks: Flow Matching for Referring Video Segmentation | Zanyi Wang et.al. | 2510.06139 | translate | read | null |
| 2025-10-05 | Learning-Based Hashing for ANN Search: Foundations and Early Advances | Sean Moran et.al. | 2510.04127 | translate | read | null |
| 2025-10-04 | UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG | Xiangyu Peng et.al. | 2510.03663 | translate | read | null |
| 2025-10-04 | Towards Unsupervised Speech Recognition at the Syllable-Level | Liming Wang et.al. | 2510.03639 | translate | read | null |
| 2025-10-02 | Latency-aware Multimodal Federated Learning over UAV Networks | Shaba Shaon et.al. | 2510.01717 | translate | read | null |
| 2025-10-01 | PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset | Thomas Campagnolo et.al. | 2510.00818 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)