Multimodal

Publish Date Title Authors PDF Code
2025-12-18 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future Tianshuai Hu et.al. 2512.16760 null
2025-12-18 Smile on the Face, Sadness in the Eyes: Bridging the Emotion Gap with a Multimodal Dataset of Eye and Facial Behaviors Kejun Liu et.al. 2512.16485 null
2025-12-17 GateFusion: Hierarchical Gated Cross-Modal Fusion for Active Speaker Detection Yu Wang et.al. 2512.15707 null
2025-12-17 An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain João Daniel Silva et.al. 2512.15531 null
2025-12-16 Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris Wenshuo Li et.al. 2512.14878 null
2025-12-15 STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning Jie Qin et.al. 2512.13752 null
2025-12-15 JoVA: Unified Multimodal Learning for Joint Video-Audio Generation Xiaohu Huang et.al. 2512.13677 null
2025-12-15 A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis Xianchao Guan et.al. 2512.13164 null
2025-12-13 EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography Yuheng Li et.al. 2512.12107 null
2025-12-12 VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing Emanuel Sánchez Aimar et.al. 2512.11490 null
2025-12-12 Exploring MLLM-Diffusion Information Transfer with MetaCanvas Han Lin et.al. 2512.11464 null
2025-12-12 AMBER: An Adaptive Multimodal Mask Transformer for Beam Prediction with Missing Modalities Chenyiming Wen et.al. 2512.11331 null
2025-12-02 Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems Matvey Nepomnyaschiy et.al. 2512.10975 null
2025-12-11 Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval J. Xiao et.al. 2512.10596 null
2025-12-11 Cross-modal Retrieval Models for Stripped Binary Analysis Guoqiang Chen et.al. 2512.10393 null
2025-12-05 What Happens When: Learning Temporal Orders of Events in Videos Daechul Ahn et.al. 2512.08979 null
2025-12-09 Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval Tao Chen et.al. 2512.08410 null
2025-12-08 CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification Pingchuan Ma et.al. 2512.08071 null
2025-12-08 Unison: A Fully Automatic, Task-Universal, and Low-Cost Framework for Unified Understanding and Generation Shihao Zhao et.al. 2512.07747 null
2025-12-08 VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation Md Selim Sarowar et.al. 2512.07215 null
2025-12-07 A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations Waleed Razzaq et.al. 2512.06708 null
2025-12-06 Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion Jaewon Ahn et.al. 2512.06449 null
2025-12-05 Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures Amirkia Rafiei Oskooei et.al. 2512.05908 null
2025-12-04 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer Xianfeng Wu et.al. 2512.05060 null
2025-12-03 Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation Xiaosen Lyu et.al. 2512.03521 null
2025-12-03 Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation Xieji Li et.al. 2512.03445 null
2025-12-03 Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features Yuzhen Hu et.al. 2512.03430 null
2025-12-02 Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation Ziniu Zhang et.al. 2512.02920 null
2025-12-02 Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in Education Alvaro Becerra et.al. 2512.02651 null
2025-12-02 Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources Phuc Pham et.al. 2512.02438 null
2025-11-30 MM-ACT: Learn from Multimodal Parallel Generation to Act Haotian Liang et.al. 2512.00975 null
2025-11-29 Describe Anything Anywhere At Any Moment Nicolas Gorlo et.al. 2512.00565 null
2025-11-29 CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA Vsevolod Kovalev et.al. 2512.00360 null
2025-11-28 Buffer replay enhances the robustness of multimodal learning under missing-modality Hongye Zhu et.al. 2511.23070 null
2025-11-27 Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation Xinyi Che et.al. 2511.22463 null
2025-11-27 Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation Xinyi Che et.al. 2511.22447 null
2025-11-27 Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples Shuhei Yamashita et.al. 2511.22141 null
2025-11-26 WalkCLIP: Multimodal Learning for Urban Walkability Prediction Shilong Xiang et.al. 2511.21947 null
2025-11-26 Evaluating Strategies for Synthesizing Clinical Notes for Medical Multimodal AI Niccolo Marini et.al. 2511.21827 null
2025-11-26 Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling Mengran Li et.al. 2511.21120 null
2025-11-25 A review on data fusion in multimodal learning analytics and educational data mining Wilson Chango et.al. 2511.20871 null
2025-11-25 VibraVerse: A Large-Scale Geometry-Acoustics Alignment Dataset for Physically-Consistent Multimodal Learning Bo Pang et.al. 2511.20422 null
2025-11-25 MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts Zilong Huang et.al. 2511.20415 null
2025-11-25 ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis Advik Sinha et.al. 2511.20274 null
2025-11-24 Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation Yingjia Shang et.al. 2511.19257 null
2025-11-24 IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes Carl Lindström et.al. 2511.19235 null
2025-11-24 Can Modern Vision Models Understand the Difference Between an Object and a Look-alike? Itay Cohen et.al. 2511.19200 null
2025-11-23 Breaking Forgetting: Training-Free Few-Shot Class-Incremental Learning via Conditional Diffusion Haidong Kang et.al. 2511.18516 null
2025-11-22 Vulnerability-Aware Robust Multimodal Adversarial Training Junrui Zhang et.al. 2511.18138 null
2025-11-22 Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning Xiaohong Liu et.al. 2511.18104 null
2025-11-17 Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding Yassir Benhammou et.al. 2511.17596 null
2025-11-21 MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment Huangbiao Xu et.al. 2511.17397 null
2025-11-21 UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation Chi Zhang et.al. 2511.16917 null
2025-11-20 LLaVA $^3$ : Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs Doriand Petit et.al. 2511.16454 null
2025-11-20 Boosting Medical Visual Understanding From Multi-Granular Language Learning Zihan Li et.al. 2511.15943 null
2025-11-18 Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer Hyo-Jeong Jang et.al. 2511.15741 null
2025-11-19 SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome Dabin Jeong et.al. 2511.15464 null
2025-11-19 Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions Shan Shan et.al. 2511.15342 null
2025-11-19 Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval Qing Wang et.al. 2511.15201 null
2025-11-19 TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition Wen Yin et.al. 2511.15085 null
2025-11-18 Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion Zanxu Wang et.al. 2511.14969 null
2025-11-18 Toward Robust and Harmonious Adaptation for Cross-modal Retrieval Haobin Li et.al. 2511.14416 null
2025-11-18 Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation Weimin Bai et.al. 2511.14271 null
2025-11-18 Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision Zitang Sun et.al. 2511.14197 null
2025-11-14 Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement Zhe Yang et.al. 2511.13755 null
2025-11-17 3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale Yijia Fan et.al. 2511.13211 null
2025-11-17 uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data Dahyun Chung et.al. 2511.13036 null
2025-11-17 Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks Minsoo Jo et.al. 2511.12985 null
2025-11-15 To Align or Not to Align: Strategic Multimodal Representation Alignment for Optimal Performance Wanlong Fang et.al. 2511.12121 null
2025-11-14 Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification Qinghao Gao et.al. 2511.11460 null
2025-11-14 AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery Yuqi Yin et.al. 2511.11257 null
2025-11-14 LEMUR: Large scale End-to-end MUltimodal Recommendation Xintian Han et.al. 2511.10962 null
2025-11-14 MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition Feng Li et.al. 2511.10892 null
2025-11-13 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals Shruti Singh Baghel et.al. 2511.10615 null
2025-11-13 URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding Yongxin Shi et.al. 2511.10552 null
2025-11-13 GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval Hao Zou et.al. 2511.10154 null
2025-11-13 Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction Mingda Jia et.al. 2511.10134 null
2025-11-13 Towards Robust Multimodal Learning in the Open World Fushuo Huo et.al. 2511.09989 null
2025-11-12 Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard Stelios Zarifis et.al. 2511.09727 null
2025-11-12 End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering Jiliang Hu et.al. 2511.09282 null
2025-11-11 Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding Da Li et.al. 2511.08480 null
2025-11-11 Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation Jun Sun et.al. 2511.08152 null
2025-11-11 Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval Likang Peng et.al. 2511.07780 null
2025-11-11 Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling Jiale Liu et.al. 2511.07710 null
2025-11-10 A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation Kamand Kalashi et.al. 2511.07573 null
2025-11-10 Integrating Epigenetic and Phenotypic Features for Biological Age Estimation in Cancer Patients via Multimodal Learning Shuyue Jiang et.al. 2511.07219 null
2025-11-10 Med-SORA: Symptom to Organ Reasoning in Abdomen CT Images You-Kyoung Na et.al. 2511.06752 null
2025-11-09 LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval Jian Zhang et.al. 2511.06268 null
2025-11-09 VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving Ruifei Zhang et.al. 2511.06256 null
2025-11-09 AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving Ruifei Zhang et.al. 2511.06253 null
2025-11-08 Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models Akshar Tumu et.al. 2511.06146 null
2025-11-04 Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction An Vuong et.al. 2511.05577 null
2025-11-06 DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification Yujie Yang et.al. 2511.04281 null
2025-11-05 Cross-Modal Alignment via Variational Copula Modelling Feng Wu et.al. 2511.03196 null
2025-11-04 SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment Wenbo Lu et.al. 2511.03019 null
2025-11-04 ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology Srikumar Sastry et.al. 2511.02946 null
2025-11-04 When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning Chenyu Zhang et.al. 2511.02794 null
2025-11-03 OmniFuser: Adaptive Multimodal Fusion for Service-Oriented Predictive Maintenance Ziqi Wang et.al. 2511.01320 null
2025-11-02 Balanced Multimodal Learning via Mutual Information Rongrong Xie et.al. 2511.00987 null
2025-11-01 LIR: The First Workshop on Late Interaction and Multi Vector Retrieval @ ECIR 2026 Benjamin Clavié et.al. 2511.00444 null
2025-11-01 Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities Xihang Qiu et.al. 2511.00344 null
2025-10-24 Multimodal Detection of Fake Reviews using BERT and ResNet-50 Suhasnadh Reddy Veluru et.al. 2511.00020 null
2025-10-04 Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment Adrian-Dinu Urse et.al. 2511.00004 null
2025-10-31 MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data Yu-Chen Kuo et.al. 2510.27321 null
2025-10-30 Evaluating Perspectival Biases in Cross-Modal Retrieval Teerapol Saengsukhiran et.al. 2510.26861 null
2025-10-30 Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise Zijing Xu et.al. 2510.26289 null
2025-10-29 Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start Kun Chen et.al. 2510.25801 null
2025-10-29 LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation Yang Miao et.al. 2510.25263 null
2025-10-29 H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts Peilin Tan et.al. 2510.25091 null
2025-10-29 Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments Manjunath Prasad Holenarasipura Rajiv et.al. 2510.25070 null
2025-10-28 Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning Hossein R. Nowdeh et.al. 2510.24919 null
2025-10-28 MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition Haoyang Zhang et.al. 2510.24827 null
2025-10-24 Towards Fine-Grained Human Motion Video Captioning Guorui Song et.al. 2510.24767 null
2025-10-27 Toward Clinically Grounded Foundation Models in Pathology Hamid R. Tizhoosh et.al. 2510.23807 null
2025-10-27 Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier Hyeongseop Rha et.al. 2510.23506 null
2025-10-27 Evaluation of Vision-LLMs in Surveillance Video Pascal Benschop et.al. 2510.23190 null
2025-10-21 Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation Chanyoung Chung et.al. 2510.21812 null
2025-10-07 Avi: Action from Volumetric Inference Harris Song et.al. 2510.21746 null
2025-10-24 CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis Yiming Tang et.al. 2510.21464 null
2025-10-24 Bridging the gap to real-world language-grounded visual concept learning Whie Jung et.al. 2510.21412 null
2025-10-23 Multimodal Negative Learning Baoquan Gong et.al. 2510.20877 null
2025-10-23 Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process Tsai Hor Chan et.al. 2510.20736 null
2025-10-23 Calibrating Multimodal Consensus for Emotion Recognition Guowei Zhong et.al. 2510.20256 null
2025-10-22 Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment Yuhang Liu et.al. 2510.19384 null
2025-10-22 FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation Chirag Padubidri et.al. 2510.19305 null
2025-10-21 Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation Yasser Hamidullah et.al. 2510.18439 null
2025-10-20 Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware Stavros Mitsis et.al. 2510.18036 null
2025-10-20 MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning Alejandro Guerra-Manzanares et.al. 2510.17394 null
2025-10-19 Graph4MM: Weaving Multimodal Learning with Structural Information Xuying Ning et.al. 2510.16990 null
2025-10-19 ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning Yingxu Wang et.al. 2510.16824 null
2025-10-19 Pursuing Minimal Sufficiency in Spatial Reasoning Yejie Guo et.al. 2510.16688 null
2025-10-18 Safire: Similarity Framework for Visualization Retrieval Huyen N. Nguyen et.al. 2510.16662 null
2025-10-18 Structured Interfaces for Automated Reasoning with 3D Scene Graphs Aaron Ray et.al. 2510.16643 null
2025-10-09 Lyapunov-Stable Adaptive Control for Multimodal Concept Drift Tianyu Bell Pan et.al. 2510.15944 null
2025-10-17 Towards Relaxed Multimodal Inputs for Gait-based Parkinson’s Disease Assessment Minlin Zeng et.al. 2510.15748 null
2025-10-16 From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Zhe Li et.al. 2510.14952 null
2025-10-16 Revisit Modality Imbalance at the Decision Layer Xiaoyu Ma et.al. 2510.14411 null
2025-10-15 A Multimodal Approach to Heritage Preservation in the Context of Climate Change David Roqui et.al. 2510.14136 null
2025-10-15 Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation Jiamin Chen et.al. 2510.13191 null
2025-10-15 Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning Rongrong Xie et.al. 2510.13182 null
2025-10-14 A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation Shurong Chai et.al. 2510.12482 null
2025-10-14 Ground Stratification for a Logic of Definitions with Induction Nathan Guermond et.al. 2510.12297 null
2025-10-14 IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation Wenxu Zhou et.al. 2510.12095 null
2025-10-13 Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis Blessing Agyei Kyem et.al. 2510.11907 null
2025-10-10 Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition Huimin Liu et.al. 2510.09203 null
2025-10-09 Provably Robust Adaptation for Language-Empowered Foundation Models Yuni Lai et.al. 2510.08659 null
2025-10-07 Centering Emotion Hotspots: Multimodal Local-Global Fusion and Cross-Modal Alignment for Emotion Recognition in Conversations Yu Liu et.al. 2510.08606 null
2025-10-09 Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling Bianca-Mihaela Ganescu et.al. 2510.08470 link
2025-10-08 FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams Corban Rivera et.al. 2510.07417 null
2025-09-30 MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation Md Zubair et.al. 2510.07328 null
2025-10-08 TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation Jiaben Chen et.al. 2510.07249 null
2025-10-08 Expressive and Scalable Quantum Fusion for Multimodal Learning Tuyen Nguyen et.al. 2510.06938 null
2025-10-07 Deforming Videos to Masks: Flow Matching for Referring Video Segmentation Zanyi Wang et.al. 2510.06139 link
2025-10-04 Towards Unsupervised Speech Recognition at the Syllable-Level Liming Wang et.al. 2510.03639 null
2025-09-25 Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data Jiancheng Zhang et.al. 2510.03247 null
2025-10-02 Latency-aware Multimodal Federated Learning over UAV Networks Shaba Shaon et.al. 2510.01717 null
2025-10-01 PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset Thomas Campagnolo et.al. 2510.00818 null
2025-09-30 MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning Seong-Hyeon Hwang et.al. 2509.25831 null
2025-09-29 FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology Faizan Farooq Khan et.al. 2509.25564 null
2025-09-29 MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series Payal Mohapatra et.al. 2509.25278 null
2025-09-29 A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity Giordano Cicchetti et.al. 2509.24734 null
2025-09-29 Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey Yuntao Shou et.al. 2509.24322 link
2025-09-28 Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics Luxuan Zhang et.al. 2509.23543 null
2025-09-26 RefAM: Attention Magnets for Zero-Shot Referral Segmentation Anna Kukleva et.al. 2509.22650 null
2025-09-26 HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes Katrina Ashton et.al. 2509.22498 null
2025-09-26 From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment Ke Ye et.al. 2509.22205 null
2025-09-26 VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation Huayi Zhou et.al. 2509.21723 null
2025-09-14 LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition Zejun Liu et.al. 2509.19330 null
2025-09-10 Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning Yiqiao Chen et.al. 2509.19315 null
2025-09-23 Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation Christian Ganhör et.al. 2509.18807 null
2025-09-23 M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition Jiajun He et.al. 2509.18706 null
2025-09-22 Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction Yi Gu et.al. 2509.18284 null
2025-09-22 ClassMind: Scaling Classroom Observation and Instructional Feedback with Multimodal AI Ao Qu et.al. 2509.18020 null
2025-09-22 M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer Yanxin Zhang et.al. 2509.18005 null
2025-09-22 Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training Divya Mereddy et.al. 2509.17888 null
2025-09-20 Self-organized epithelial reticulum inhibits cell proliferation Liav Daraf et.al. 2509.16661 null
2025-09-19 Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation Weimin Bai et.al. 2509.15772 null
2025-09-19 Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion Shanghong Li et.al. 2509.15578 null
2025-09-19 Beyond Words: Enhancing Desire, Emotion, and Sentiment Recognition with Non-Verbal Cues Wei Chen et.al. 2509.15540 null
2025-09-17 Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays Hanbin Ko et.al. 2509.15234 null
2025-09-17 VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI Daiqi Liu et.al. 2509.13767 null
2025-09-15 Evaluating Robustness of Vision-Language Models Under Noisy Conditions Purushoth et.al. 2509.12492 null
2025-09-15 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Yang Zhou et.al. 2509.12201 link
2025-09-15 Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI Bo Cao et.al. 2509.11924 null
2025-09-14 GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration Wan Xu et.al. 2509.11360 null
2025-09-14 DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations Doan Minh Trung et.al. 2509.11187 null
2025-09-14 Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation Nhi Kieu et.al. 2509.11102 null
2025-09-13 Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction Yi Lu et.al. 2509.10802 null
2025-09-11 Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training Anthony P. Addison et.al. 2509.09290 null
2025-09-09 Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review Alvaro Becerra et.al. 2509.07742 null
2025-09-08 Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding Jiangnan Xie et.al. 2509.06291 null
2025-09-06 GraMFedDHAR: Graph Based Multimodal Differentially Private Federated HAR Labani Halder et.al. 2509.05671 null
2025-09-06 Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities Xiaoguang Zhu et.al. 2509.05615 null
2025-09-04 Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models Kimia Ehsani et.al. 2509.03837 null
2025-09-03 Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support Eduardo Davalos et.al. 2509.03741 null
2025-09-03 Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning Duy A. Nguyen et.al. 2509.03477 null
2025-09-03 Multimodal learning of melt pool dynamics in laser powder bed fusion Satyajit Mojumder et.al. 2509.03029 null
2025-09-03 Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability Shuai Jiang et.al. 2509.02962 null
2025-09-02 Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception Changshi Zhou et.al. 2509.02324 null
2025-09-02 Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective Shijie Wang et.al. 2509.02281 null
2025-09-02 Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic Nirmalya Thakur et.al. 2509.01954 null
2025-09-01 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Yanqing Liu et.al. 2509.01644 link
2025-09-01 Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement Jiayi Gao et.al. 2509.01362 null
2025-08-29 Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer Daniël Boeke et.al. 2508.21581 null
2025-08-27 Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement Mohammed Rakibul Hasan et.al. 2508.19887 null
2025-08-27 AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning Shu Shen et.al. 2508.19769 null
2025-08-25 BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration Jun Hou et.al. 2508.18551 null
2025-08-22 Can VLMs Recall Factual Associations From Visual References? Dhananjay Ashok et.al. 2508.18297 null
2025-08-20 Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders Yiming Tang et.al. 2508.18236 null
2025-08-24 Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice Hugo Bohy et.al. 2508.17502 link
2025-08-24 Multimodal Representation Learning Conditioned on Semantic Relations Yang Qiao et.al. 2508.17497 null
2025-08-24 SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality Yuzhi Lai et.al. 2508.17255 null
2025-08-10 An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance Hsuan-Kung Yang et.al. 2508.16602 null
2025-08-22 Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization Yupei Zhang et.al. 2508.16479 null
2025-08-22 A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension Mohammad Zia Ur Rehman et.al. 2508.16300 null
2025-08-21 Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation Huy Hoang Nguyen et.al. 2508.15427 null
2025-08-21 DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding Zhu Wang et.al. 2508.15297 null
2025-08-20 MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs Ruyi Ding et.al. 2508.15036 null
2025-08-19 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications Omkar Thawakar et.al. 2508.14039 link
2025-08-19 CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter Junyeong Park et.al. 2508.13530 null
2025-08-19 CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models Catherine Glossop et.al. 2508.13446 null
2025-08-18 SPANER: Shared Prompt Aligner for Multimodal Semantic Representation Thye Shan Ng et.al. 2508.13387 null
2025-08-18 Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation Tanjim Islam Riju et.al. 2508.13068 null
2025-08-17 Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping Xuhui Zhan et.al. 2508.12466 link
2025-08-16 MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization Haochen You et.al. 2508.12149 null
2025-08-16 ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models Zhichen Lou et.al. 2508.11918 null
2025-08-13 MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning Thanh-Dat Truong et.al. 2508.10133 null
2025-08-13 Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model Sushrut Patwardhan et.al. 2508.10110 null
2025-08-12 LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition Zhining He et.al. 2508.08925 null
2025-08-12 Multimodal learning enables instant ionizing radiation alerts on unmodified mobile phones for real-world emergency response Yanfeng Xie et.al. 2508.08541 null
2025-08-11 BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models Maozhen Zhang et.al. 2508.08040 null
2025-08-11 A Trustworthy Method for Multimodal Emotion Recognition Junxiao Xue et.al. 2508.07625 null
2025-08-10 Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding Zhaoyu Chen et.al. 2508.07388 null
2025-08-10 FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning Van Duc Cuong et.al. 2508.07264 null
2025-08-09 Can Multitask Learning Enhance Model Explainability? Hiba Najjar et.al. 2508.06966 null
2025-08-09 Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction Hiba Najjar et.al. 2508.06939 null
2025-08-09 Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities Rui Liu et.al. 2508.06800 null
2025-08-08 Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records Mosbah Aouad et.al. 2508.06627 null
2025-08-07 Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features Manish Kansana et.al. 2508.06566 null
2025-08-06 Grounding Emotion Recognition with Visual Prototypes: VEGA – Revisiting CLIP in MERC Guanyu Hu et.al. 2508.06564 null
2025-08-08 Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning Xiangyu Wu et.al. 2508.06382 null
2025-08-08 ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge Juewen Hu et.al. 2508.05991 null
2025-08-07 Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning Luai Abuelsamen et.al. 2508.05077 null
2025-08-07 MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding Weifan Zhang et.al. 2508.05021 null
2025-08-06 Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models Md Raisul Kibria et.al. 2508.04427 null
2025-08-06 Length Matters: Length-Aware Transformer for Temporal Sentence Grounding Yifan Wang et.al. 2508.04299 null
2025-08-06 SVC 2025: the First Multimodal Deception Detection Challenge Xun Lin et.al. 2508.04129 null
2025-07-29 Multimodal Video Emotion Recognition with Reliable Reasoning Priors Zhepeng Wang et.al. 2508.03722 null
2025-08-05 T2UE: Generating Unlearnable Examples from Text Descriptions Xingjun Ma et.al. 2508.03091 null
2025-08-04 MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming Shuo Wang et.al. 2508.02549 null
2025-08-04 Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs Yitong Zhu et.al. 2508.02133 null
2025-08-04 “Harmless to You, Hurtful to Me!”: Investigating the Detection of Toxic Languages Grounded in the Perspective of Youth Yaqiong Li et.al. 2508.02094 null
2025-08-03 DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition Peiyuan Jiang et.al. 2508.01644 null
2025-08-02 A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics Rushin H. Gindra et.al. 2508.01490 null
2025-08-02 AffectGPT-R1: Leveraging Reinforcement Learning for Open-Vocabulary Emotion Recognition Zheng Lian et.al. 2508.01318 null
2025-07-29 SmartCLIP: Modular Vision-language Alignment with Identification Guarantees Shaoan Xie et.al. 2507.22264 null
2025-07-29 MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces Shaojun E et.al. 2507.21741 link
2025-07-29 Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion Zeyu Deng et.al. 2507.21395 null
2025-07-28 On the Limits of Hierarchically Embedded Logic in Classical Neural Networks Bill Cochran et.al. 2507.20960 null
2025-07-28 TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model Ao Li et.al. 2507.20630 null
2025-07-25 Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization Hsuan-Yu Wang et.al. 2507.19356 null
2025-07-25 SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality Sijie Li et.al. 2507.19264 null
2025-07-24 Deep Learning for Blood-Brain Barrier Permeability Prediction Zihan Yang et.al. 2507.18557 null
2025-07-23 RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding Xi Xiao et.al. 2507.17353 null
2025-07-22 VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings Ramin Giahi et.al. 2507.17080 null
2025-07-20 TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning Jie He et.al. 2507.16844 null
2025-07-21 Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure Alexandra Junell et.al. 2507.16088 null
2025-07-21 MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations Deyun Zhang et.al. 2507.15255 null
2025-07-20 LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering Xinxin Dong et.al. 2507.14784 null
2025-07-18 MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training Yuechen Xie et.al. 2507.13673 null
2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Penglei Sun et.al. 2507.12795 null
2025-07-17 A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models Weijieying Ren et.al. 2507.12774 null
2025-07-15 Partitioner Guided Modal Learning Framework Guimin Hu et.al. 2507.11661 null
2025-07-15 A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition Xinkui Zhao et.al. 2507.11202 null
2025-07-14 Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language Andrew C. Li et.al. 2507.10741 null
2025-07-14 Boosting Multimodal Learning via Disentangled Gradient Learning Shicai Wei et.al. 2507.10213 null
2025-07-21 Improving Multimodal Learning via Imbalanced Learning Shicai Wei et.al. 2507.10203 link
2025-07-13 HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space Changli Wang et.al. 2507.09487 null
2025-07-09 Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors Chunzhuo Wang et.al. 2507.07261 null
2025-07-09 Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey Getamesay Haile Dagnaw et.al. 2507.07148 null
2025-07-08 Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration Maximilian Tschuchnig et.al. 2507.06067 null
2025-07-08 Graph Learning Feng Xia et.al. 2507.05636 null
2025-07-07 Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models Eunseop Yoon et.al. 2507.04976 null
2025-07-07 From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach Mihai Masala et.al. 2507.04815 null
2025-07-07 MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding Zhicheng Zhang et.al. 2507.04635 null
2025-07-10 DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth Zheng Lian et.al. 2507.04278 null
2025-07-05 Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation Fernando Gabriela Garcia et.al. 2507.04151 null
2025-07-03 Intelligent Histology for Tumor Neurosurgery Xinhai Hou et.al. 2507.03037 null
2025-07-01 Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers Yusuf Shihata et.al. 2507.02985 null
2025-07-02 TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation Yubeen Lee et.al. 2507.02080 null
2025-06-27 XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science Jithendaraa Subramanian et.al. 2507.01054 null
2025-06-27 Test-Time Consistency in Vision Language Models Shih-Han Chou et.al. 2506.22395 null
2025-06-27 Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems Abdulmomen Ghalkha et.al. 2506.22374 null
2025-06-26 ImplicitQA: Going beyond frames towards Implicit Video Reasoning Sirnam Swetha et.al. 2506.21742 link
2025-06-28 G $^{2}$ D: Boosting Multimodal Learning with Gradient-Guided Distillation Mohammed Rakib et.al. 2506.21514 null
2025-06-26 V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling Junwei You et.al. 2506.21041 null
2025-06-26 TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence Feng Jiang et.al. 2506.21028 null
2025-06-26 Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024) Shihui Feng et.al. 2506.20971 null
2025-06-24 Emergence of Text Readability in Vision Language Models Jaeyoo Park et.al. 2506.19389 null
2025-06-27 Haptic-ACT – Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers Pedro Miguel Uriguen Eljuri et.al. 2506.18212 null
2025-06-21 Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning? Yuesheng Huang et.al. 2506.17623 null
2025-06-24 AI-based Multimodal Biometrics for Detecting Smartphone Distractions: Application to Online Learning Alvaro Becerra et.al. 2506.17364 null
2025-06-20 With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You Fabian Gröger et.al. 2506.16895 null
2025-06-18 A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion Fangzhou Lin et.al. 2506.15747 null
2025-06-18 Foundation of Affective Computing and Interaction Changzeng Fu et.al. 2506.15497 null
2025-06-18 video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Changli Tang et.al. 2506.15220 link
2025-06-17 Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation? Nitesh Subedi et.al. 2506.14507 link
2025-06-16 Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography Yusdivia Molina-Román et.al. 2506.13964 null
2025-06-16 A Survey on World Models Grounded in Acoustic Physical Information Xiaoliang Chen et.al. 2506.13833 link
2025-06-16 A Survey on Imitation Learning for Contact-Rich Tasks in Robotics Toshiaki Tsuji et.al. 2506.13498 null
2025-06-16 Fatigue-Aware Adaptive Interfaces for Wearable Devices Using Deep Learning Yikan Wang et.al. 2506.13203 null
2025-06-15 Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models Liam Bennett et.al. 2506.12733 null
2025-06-14 Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics Asifullah khan et.al. 2506.12365 null
2025-06-14 GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition Yuntao Shou et.al. 2506.12325 null
2025-06-16 Improving Multimodal Learning Balance and Sufficiency through Data Remixing Xiaoyu Ma et.al. 2506.11550 link
2025-06-13 RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer Haotian Ni et.al. 2506.11465 null
2025-06-12 Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education Conrad Borchers et.al. 2506.11326 null
2025-06-12 Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction Thanathai Lertpetchpun et.al. 2506.10930 null
2025-06-12 Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts Guowei Zhong et.al. 2506.10452 link
2025-06-09 Segment Any Architectural Facades (SAAF):An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance Peilin Li et.al. 2506.09071 null
2025-06-10 Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment Maximilian Tschuchnig et.al. 2506.08716 null
2025-06-10 MOSAIC-F: A Framework for Enhancing Students’ Oral Presentation Skills through Personalized Feedback Alvaro Becerra et.al. 2506.08634 null
2025-06-09 Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs Jared Strader et.al. 2506.07454 null
2025-06-08 A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning Jiachen Zhong et.al. 2506.07236 null
2025-06-08 Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Tianyi Bai et.al. 2506.07227 null
2025-06-08 A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge Tarique Dahri et.al. 2506.07055 null
2025-06-06 Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Sheng Chen et.al. 2506.06205 null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 null
2025-06-06 MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory Ana Carolina Condez et.al. 2506.05696 null
2025-06-03 Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation Israa A. Albadarneh et.al. 2506.05399 null
2025-06-05 Towards Language-Augmented Multi-Agent Deep Reinforcement Learning Maxime Toquebiau et.al. 2506.05236 null
2025-06-05 Quantifying Cross-Modality Memorization in Vision-Language Models Yuxin Wen et.al. 2506.05198 null
2025-06-05 A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions Anh Le et.al. 2506.05061 null
2025-06-04 EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation Cheng Zhang et.al. 2506.03652 null
2025-06-03 Enriching Location Representation with Detailed Semantic Information Junyuan Liu et.al. 2506.02744 null
2025-06-02 Entity Image and Mixed-Modal Image Retrieval Datasets Cristian-Ioan Blaga et.al. 2506.02291 null
2025-06-02 Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities Yanxi Luo et.al. 2506.01490 null
2025-06-02 Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark Shuyu Yang et.al. 2506.01466 null
2025-06-02 Agentic Episodic Control Xidong Yang et.al. 2506.01442 null
2025-06-01 Leveraging CLIP Encoder for Multimodal Emotion Recognition Yehun Song et.al. 2506.00903 null
2025-06-01 GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints Jiajun He et.al. 2506.00865 null
2025-06-01 TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning Jiaqi Luo et.al. 2506.00813 null
2025-05-30 Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework Can Polat et.al. 2506.00302 null
2025-05-30 Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts Xin He et.al. 2505.24541 null
2025-05-29 Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition Sean Foley et.al. 2505.24059 null
2025-06-02 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles Zifu Wang et.al. 2505.23590 link
2025-05-29 OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data Fengxiang Wang et.al. 2505.23522 null
2025-05-29 Bidirectional predictive coding Gaspard Oliviers et.al. 2505.23415 null
2025-05-29 Deep Modeling and Optimization of Medical Image Classification Yihang Wu et.al. 2505.23040 link
2025-05-30 EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations Haoqin Sun et.al. 2505.23018 link
2025-05-27 A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features Ihab Bendidi et.al. 2505.21317 null
2025-05-26 Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects Chengyan Wu et.al. 2505.20511 null
2025-05-25 PDFBench: A Benchmark for De novo Protein Design from Function Jiahao Kuang et.al. 2505.20346 null
2025-05-26 Learning Optimal Multimodal Information Bottleneck Representations Qilong Wu et.al. 2505.19996 null
2025-05-26 ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs Pooneh Mousavi et.al. 2505.19937 null
2025-05-26 Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning Sanghyuk Chun et.al. 2505.19614 null
2025-05-26 Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate Liangwei Nathan Zheng et.al. 2505.19525 null
2025-05-25 Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding Shiyue Wang et.al. 2505.19219 null
2025-05-25 I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin et.al. 2505.19190 link
2025-05-23 Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation Zhihua Liu et.al. 2505.17994 null
2025-05-23 HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning Chuhao Zhou et.al. 2505.17645 null
2025-05-23 RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition Yuehan Jin et.al. 2505.17501 null
2025-05-21 NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation Weiming Wu et.al. 2505.17121 null
2025-05-22 ICYM2I: The illusion of multimodal informativeness under missingness Young Sang Choi et.al. 2505.16953 link
2025-05-22 Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports Francesco Dalla Serra et.al. 2505.16624 null
2025-05-22 Multimodal Online Federated Learning with Modality Missing in Internet of Things Heqiang Wang et.al. 2505.16138 null
2025-05-21 Robust Multimodal Learning via Entropy-Gated Contrastive Fusion Leon Chlon et.al. 2505.15417 null
2025-05-21 EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy Chi Kit Ng et.al. 2505.15206 null
2025-05-21 Graph Foundation Models: A Comprehensive Survey Zehong Wang et.al. 2505.15116 link
2025-05-19 HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity Xuejun Sun et.al. 2505.14725 link
2025-05-20 Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning Jiangrong Shen et.al. 2505.14535 null
2025-05-20 Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition Shuo Zhang et.al. 2505.14143 null
2025-05-20 LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts Qifeng Cai et.al. 2505.13928 link
2025-05-17 Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering Hessa Alawwad et.al. 2505.13520 null
2025-05-19 AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning Kai Zhang et.al. 2505.12782 null
2025-05-19 PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI Yingchen He et.al. 2505.12707 null
2025-05-17 Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding Can Polat et.al. 2505.12137 null
2025-05-17 SafeVid: Toward Safety Aligned Video Large Multimodal Models Yixu Wang et.al. 2505.11926 null
2025-05-16 GeoMM: On Geodesic Perspective for Multi-modal Learning Shibin Mei et.al. 2505.11216 null
2025-05-15 Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence Xiang He et.al. 2505.10176 link
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Chaofan Zhang et.al. 2505.09577 null
2025-05-16 Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora Michael Majurski et.al. 2505.08905 link
2025-05-13 Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities Jueqing Lu et.al. 2505.08283 null
2025-05-11 MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning Lishan Yang et.al. 2505.06911 null
2025-05-10 Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning H M Dipu Kabir et.al. 2505.06592 link
2025-05-10 TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition Feng Liu et.al. 2505.06536 link
2025-05-09 NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines Chathurangi Shyalika et.al. 2505.06333 link
2025-05-09 Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models Jugal Gajjar et.al. 2505.06110 null
2025-05-09 Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects Tobias Preintner et.al. 2505.06030 link
2025-05-08 The Moon’s Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction Tom Sander et.al. 2505.05644 null
2025-05-07 OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Xianhang Li et.al. 2505.04601 null
2025-05-02 Mapping the Climate Change Landscape on TikTok Alessia Galdeman et.al. 2505.03813 null
2025-05-06 Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant Haonan Wang et.al. 2505.03380 null
2025-05-06 A Vision-Language Model for Focal Liver Lesion Classification Song Jian et.al. 2505.03350 null
2025-05-06 SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation Yu-Ren Guo et.al. 2505.03244 null
2025-05-05 The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI Kishore Sampath et.al. 2505.03020 null
2025-05-02 Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders Rogelio A Mancisidor et.al. 2505.01134 null
2025-04-30 Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design Vasudev Sharma et.al. 2505.00134 null
2025-04-28 DEEMO: De-identity Multimodal Emotion Recognition and Reasoning Deng Li et.al. 2504.19549 null
2025-04-27 Platonic Grounding for Efficient Multimodal Language Models Moulik Choraria et.al. 2504.19327 null
2025-04-27 DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning Jialang Lu et.al. 2504.19127 null
2025-04-23 A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw Wenwen Li et.al. 2504.17822 null
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 null
2025-04-23 Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation Lakshita Agarwal et.al. 2504.16788 null
2025-04-23 PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System Xianghe Liu et.al. 2504.16573 null
2025-04-22 CLIP-IT: CLIP-based Pairing for Histology Images Classification Banafsheh Karimian et.al. 2504.16181 null
2025-04-22 SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems Manjunath D et.al. 2504.15728 null
2025-04-21 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Guo Chen et.al. 2504.15271 null
2025-04-21 IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification Fengyuan Nie et.al. 2504.14833 null
2025-04-19 Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction Li Yu et.al. 2504.14267 null
2025-04-19 PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models Nusrat Jahan Prottasha et.al. 2504.14117 null
2025-04-18 Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation Duy A. Nguyen et.al. 2504.13465 null
2025-04-17 A Survey on Cross-Modal Interaction Between Music and Multimodal Data Sifei Li et.al. 2504.12796 null
2025-04-16 An Algebraic Extension of Intuitionistic Linear Logic: The $L_!^S$ -Calculus and Its Categorical Model Alejandro Díaz-Caro et.al. 2504.12128 null
2025-04-16 FedEPA: Enhancing Personalization and Modality Alignment in Multimodal Federated Learning Yu Zhang et.al. 2504.12025 null
2025-04-15 Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset Elisa Ancarani et.al. 2504.11232 null
2025-04-14 Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge Maria Tzelepi et.al. 2504.09914 null
2025-04-13 Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention Vasilii Korolkov et.al. 2504.09738 null
2025-04-13 Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation Yongchao Feng et.al. 2504.09480 link
2025-04-09 Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging Siyuan Dai et.al. 2504.07336 null
2025-04-07 Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework Yu Min Park et.al. 2504.05187 null
2025-04-07 Leveraging Label Potential for Enhanced Multimodal Emotion Recognition Xuechun Shao et.al. 2504.05158 null
2025-04-06 FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency Shiyan Liu et.al. 2504.04427 null
2025-04-04 Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives Xiaokun Liu et.al. 2504.03847 null
2025-04-04 DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models Sathish Kumar et.al. 2504.03423 null
2025-04-02 Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities Jing Liu et.al. 2504.01954 null
2025-04-02 Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications Wanqing Yang et.al. 2504.01490 null
2025-03-31 Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition François Olivier et.al. 2503.24110 null
2025-03-31 DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description Adrienne Deganutti et.al. 2503.24096 null
2025-03-31 BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation Yumeng Fu et.al. 2503.23990 null
2025-03-31 Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion Jiagen Li et.al. 2503.23721 null
2025-03-31 HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Kun Liu et.al. 2503.23715 null
2025-03-27 Graph-to-Vision: Multi-graph Understanding and Reasoning using Vision-Language Models Ruizhou Li et.al. 2503.21435 null
2025-03-27 UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning Hongxuan Tang et.al. 2503.21193 null
2025-03-27 AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction Shuaiyu Zhang et.al. 2503.21124 link
2025-03-26 GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations Yupei Li et.al. 2503.20919 null
2025-03-26 An Encoding of Interaction Nets in OCaml Nikolaus Huber et.al. 2503.20463 null
2025-03-27 RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models Mehdi Moshtaghi et.al. 2503.19654 null
2025-03-25 VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction Zizhi Chen et.al. 2503.19367 link
2025-03-25 LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text Weizhi Chen et.al. 2503.19311 link
2025-03-24 Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition Chengxiang Huang et.al. 2503.18595 link
2025-03-21 Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition Ran Liu et.al. 2503.17453 link
2025-03-21 MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering Jialin Chen et.al. 2503.16858 null
2025-03-20 EVA-MED: An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition Xin Huang et.al. 2503.16584 null
2025-03-18 Do Multimodal Large Language Models Understand Welding? Grigorii Khvatskii et.al. 2503.16537 null
2025-03-19 EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis Matthew Massey et.al. 2503.15625 link
2025-03-19 Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification Zhong Ji et.al. 2503.14938 null
2025-03-18 HySurvPred: Multimodal Hyperbolic Embedding with Angle-Aware Hierarchical Contrastive Learning and Uncertainty Constraints for Survival Prediction Jiaqi Yang et.al. 2503.13862 null
2025-03-17 Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning Xueying Jiang et.al. 2503.12974 null
2025-03-16 BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries Tianle Li et.al. 2503.12446 null
2025-03-15 Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition R. Gnana Praveen et.al. 2503.12261 null
2025-03-14 Cross-Modal Learning for Music-to-Music-Video Description Generation Zhuoyuan Mao et.al. 2503.11190 null
2025-03-20 Unifying 2D and 3D Vision-Language Understanding Ayush Jain et.al. 2503.10745 null
2025-03-11 TLA: Tactile-Language-Action Model for Contact-Rich Manipulation Peng Hao et.al. 2503.08548 null
2025-03-10 Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency Duy Phuong Nguyen et.al. 2503.07552 link
2025-03-10 A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis Xiang Liu et.al. 2503.06973 link
2025-03-10 HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation Xingzu Zhan et.al. 2503.06897 null
2025-03-10 Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting Cagri Gungor et.al. 2503.06860 null
2025-03-09 Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts Aref Farhadipour et.al. 2503.06805 null
2025-03-13 DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning Chengxuan Qian et.al. 2503.06456 link
2025-03-05 Beyond H&E: Unlocking Pathological Insights with Polarization via Self-supervised Learning Yao Du et.al. 2503.05933 null
2025-03-10 R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning Jiaxing Zhao et.al. 2503.05379 null
2025-03-07 Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation Xinkun Wang et.al. 2503.05319 null
2025-03-06 Large Language Models in Bioinformatics: A Survey Zhenyu Wang et.al. 2503.04490 null
2025-03-05 Rebalanced Multimodal Learning with Data-aware Unimodal Sampling Qingyuan Jiang et.al. 2503.03792 null
2025-03-04 Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data Amin Honarmandi Shandiz et.al. 2503.02849 null
2025-03-04 Multimodal AI predicts clinical outcomes of drug combinations from preclinical data Yepeng Huang et.al. 2503.02781 null
2025-03-03 Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA Zhusi Zhong et.al. 2503.02034 null
2025-03-03 DeepSuM: Deep Sufficient Modality Learning Framework Zhe Gao et.al. 2503.01728 null
2025-03-03 Dementia Insights: A Context-Based MultiModal Approach Sahar Sinene Mehdoui et.al. 2503.01226 null
2025-03-03 HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation Hongye Cheng et.al. 2503.01175 null
2025-02-28 Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction Wenrui Fan et.al. 2503.00210 null
2025-02-28 PathVG: A New Benchmark and Dataset for Pathology Visual Grounding Chunlin Zhong et.al. 2502.20869 null
2025-02-28 Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems Faisal Mohammad et.al. 2502.20806 null
2025-02-27 VideoA11y: Method and Dataset for Accessible Video Description Chaoyu Li et.al. 2502.20480 null
2025-02-27 LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding Ang Cao et.al. 2502.20389 null
2025-02-27 Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion QingYuan Jiang et.al. 2502.20120 null
2025-02-27 MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification Tong Zhang et.al. 2502.19674 null
2025-02-25 CPVis: Evidence-based Multimodal Learning Analytics for Evaluation in Collaborative Programming Gefei Zhang et.al. 2502.17835 null
2025-02-24 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI Syed Abdul Gaffar Shakhadri et.al. 2502.17092 null
2025-02-24 DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications Ibrahim Fayad et.al. 2502.17066 null
2025-02-23 Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models Zitong Lu et.al. 2502.16456 null
2025-02-23 A Survey on Industrial Anomalies Synthesis Xichen Xu et.al. 2502.16412 link
2025-02-22 Understanding the Emergence of Multimodal Representation Alignment Megan Tjandrasuwita et.al. 2502.16282 link
2025-02-21 M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards Alvaro Becerra et.al. 2502.15363 null
2025-02-20 FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis Fadillah Maani et.al. 2502.14807 link
2025-02-21 AVD2: Accident Video Diffusion for Accident Video Description Cheng Li et.al. 2502.14801 null
2025-02-19 Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition Jingwang Huang et.al. 2502.13954 link
2025-02-22 Grounding LLM Reasoning with Knowledge Graphs Alfonso Amayuelas et.al. 2502.13247 null
2025-02-18 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Zekun Qi et.al. 2502.13143 null
2025-02-18 Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning Mengshi Qi et.al. 2502.12425 link
2025-02-16 AudioSpa: Spatializing Sound Events with Text Linfeng Feng et.al. 2502.11219 null
2025-02-18 BalanceBenchmark: A Survey for Imbalanced Learning Shaoxuan Xu et.al. 2502.10816 link
2025-02-17 Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Mohammad Mahdi Abootorabi et.al. 2502.08826 link
2025-02-12 A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion Wei Dai et.al. 2502.08573 null
2025-02-17 What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations Dongqi Liu et.al. 2502.08279 null
2025-02-11 Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Amir Hosein Fadaei et.al. 2502.07277 null
2025-02-10 Generative Distribution Prediction: A Unified Approach to Multimodal Learning Xinyu Tian et.al. 2502.07090 null
2025-02-06 CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction Jaewan Lee et.al. 2502.06836 null
2025-02-10 Learning Musical Representations for Music Performance Question Answering Xingjian Diao et.al. 2502.06710 null
2025-02-04 Exploring Spatial Language Grounding Through Referring Expressions Akshar Tumu et.al. 2502.04359 null
2025-02-03 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective Xiaorui Ma et.al. 2502.01524 null
2025-02-03 MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks Alejandro Guerra-Manzanares et.al. 2502.01158 null
2025-02-01 Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition Zaitian Wang et.al. 2502.00547 link
2025-01-29 U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning Md Kaykobad Reza et.al. 2501.17823 null
2025-01-28 Molecular-driven Foundation Model for Oncologic Pathology Anurag Vaidya et.al. 2501.16652 null
2025-01-27 AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Zheng Lian et.al. 2501.16566 null
2025-01-25 Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning Negin Hashemi Dijujin et.al. 2501.15270 null
2025-01-25 Deep Multimodal Learning for Real-Time DDoS Attacks Detection in Internet of Vehicles Mohamed Ababsa et.al. 2501.15252 link
2025-01-25 Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition Junwei Feng et.al. 2501.15063 null
2025-01-23 Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Haomiao Xiong et.al. 2501.13468 link
2025-01-22 EmoTech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information with Hybrid Recurrent Network Shamin Bin Habib Avro et.al. 2501.12674 null
2025-01-21 Compositional Instruction Following with Language Models and Reinforcement Learning Vanya Cohen et.al. 2501.12539 null
2025-01-21 Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET Fatih Aksu et.al. 2501.12425 null
2025-01-20 LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations Soumya Dutta et.al. 2501.11468 null
2025-01-20 ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction Xiangyang Hu et.al. 2501.11276 link
2025-01-18 Fake Advertisements Detection Using Automated Multimodal Learning: A Case Study for Vietnamese Real Estate Data Duy Nguyen et.al. 2501.10848 null
2025-01-17 A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features Enes Karanfil et.al. 2501.10144 null
2025-01-17 TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation Vanessa Echeverria et.al. 2501.09930 null
2025-01-19 IDEA: Image Description Enhanced CLIP-Adapter Zhipeng Ye et.al. 2501.08816 link
2025-01-14 Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time Mihai Masala et.al. 2501.08460 null
2025-01-12 SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval Bhavin Jawade et.al. 2501.08347 null
2025-01-17 Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Liping Yuan et.al. 2501.07888 null
2025-01-13 Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis Andrzej D. Dobrzycki et.al. 2501.07221 null
2025-01-12 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes Mahmoud Ahmed et.al. 2501.06785 link
2025-01-14 Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding Joshua Jones et.al. 2501.04693 null
2025-01-06 CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets Tanay Agrawal et.al. 2501.03332 null
2025-01-06 MVP: Multimodal Emotion Recognition based on Video and Physiological Signals Valeriya Strizhkova et.al. 2501.03103 null
2025-01-02 Asymmetric Reinforcing against Multi-modal Representation Bias Xiyuan Gao et.al. 2501.01240 link
2025-01-02 Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning Jian Lang et.al. 2501.01120 link
2024-12-30 Aviary: training language agents on challenging scientific tasks Siddharth Narayanan et.al. 2412.21154 null
2024-12-30 Hierarchical Banzhaf Interaction for General Video-Language Representation Learning Peng Jin et.al. 2412.20964 link
2024-12-30 Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment Xuechen Wang et.al. 2412.20821 null
2024-12-29 Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment Shiyun Chen et.al. 2412.20418 null
2024-12-26 Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching Wenjing Chen et.al. 2412.19184 null
2024-12-26 CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting Siyu Jiao et.al. 2412.19142 null
2024-12-24 MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning Abdelmadjid Chergui et.al. 2412.18437 link
2024-12-23 Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion Grigor Bezirganyan et.al. 2412.18024 link
2024-12-23 A Multimodal Emotion Recognition System: Integrating Facial Expressions, Body Movement, Speech, and Spoken Language Kris Kraack et.al. 2412.17907 null
2024-12-18 Constraint-Based Model in Multimodal Learning to Improve Ventricular Arrhythmia Prediction Evariste Njomgue Fotso et.al. 2412.17840 null
2024-12-23 Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Priyaranjan Pattnayak et.al. 2412.17759 null
2024-12-23 EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities Zhe Chen et.al. 2412.17677 link
2024-12-23 V $^2$ -SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy Long Bai et.al. 2412.17595 null
2024-12-22 COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations Vanessa Su et.al. 2412.17180 null
2024-12-17 DoPTA: Improving Document Layout Analysis using Patch-Text Alignment Nikitha SR et.al. 2412.12902 null
2024-12-17 Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Shiping Ge et.al. 2412.12791 link
2024-12-17 PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution Yuhyun Kim et.al. 2412.12565 null
2024-12-16 Gramian Multimodal Representation Learning and Alignment Giordano Cicchetti et.al. 2412.11959 null
2024-12-10 Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning Can Yaras et.al. 2412.07909 null
2024-12-07 WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition Feng Li et.al. 2412.05558 null
2024-12-05 Lattice Lingo: Effect of Textual Detail on Multimodal Learning for Property Prediction of Crystals Mrigi Munjal et.al. 2412.04670 null
2024-12-04 Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning Neale Ratzlaff et.al. 2412.03467 null
2024-12-04 Grounded Language Design for Lightweight Diagramming for Formal Methods Siddhartha Prasad et.al. 2412.03310 null
2024-12-04 Dynamic Graph Neural Ordinary Differential Equation Network for Multi-modal Emotion Recognition in Conversation Yuntao Shou et.al. 2412.02935 null
2024-12-03 Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data Maximilian E. Tschuchnig et.al. 2412.02294 null
2024-12-02 Occam’s LGS: A Simple Approach for Language Gaussian Splatting Jiahuan Cheng et.al. 2412.01807 null
2024-11-30 Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment Dongfang Zhao et.al. 2412.00373 null
2024-11-29 SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition Fangze Fu et.al. 2411.19822 null
2024-11-26 Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment Zheng Chen et.al. 2411.17237 link
2024-11-26 Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation Xu Zheng et.al. 2411.17141 link
2024-11-26 Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models Colin Conwell et.al. 2411.17066 link
2024-11-26 Multimodal Alignment and Fusion: A Survey Songtao Li et.al. 2411.17040 null
2024-11-25 Language Driven Occupancy Prediction Zhu Yu et.al. 2411.16072 link
2024-11-23 From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning Lixiang Yan et.al. 2411.15590 null
2024-11-23 Botfip-LLM: An Enhanced Multimodal Scientific Computing Framework Leveraging Knowledge Distillation from Large Language Models Tianhao Chen et.al. 2411.15525 null
2024-11-22 PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision Arnav M. Das et.al. 2411.15127 null
2024-11-21 Generative AI for Music and Audio Hao-Wen Dong et.al. 2411.14627 null
2024-11-21 Multimodal 3D Reasoning Segmentation with Complex Scenes Xueying Jiang et.al. 2411.13927 null
2024-11-12 Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media Naga VS Raviteja Chappa et.al. 2411.13572 null
2024-11-20 I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences Zihan Wang et.al. 2411.12960 null
2024-11-18 MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT Xiaomin Ouyang et.al. 2411.12126 null
2024-11-19 SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach Ruoxi Sun et.al. 2411.11195 null
2024-11-15 Everything is a Video: Unifying Modalities through Next-Frame Prediction G. Thomas Hudson et.al. 2411.10503 null
2024-11-15 Weakly-Supervised Multimodal Learning on MIMIC-CXR Andrea Agostini et.al. 2411.10356 null
2024-11-15 CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation Xiaofei Zhu et.al. 2411.10060 null
2024-11-21 Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era Thanh Tam Nguyen et.al. 2411.09955 link
2024-11-14 SmartInv: Multimodal Learning for Smart Contract Invariant Inference Sally Junsong Wang et.al. 2411.09217 null
2024-11-12 NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN Sonia Raychaudhuri et.al. 2411.07848 null
2024-11-11 Multimodal Fusion Balancing Through Game-Theoretic Regularization Konstantinos Kontras et.al. 2411.07335 null
2024-11-11 StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Yichen He et.al. 2411.07076 link
2024-11-08 Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors Yuanyuan Liu et.al. 2411.05879 null
2024-11-06 AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool Zhongliang Tang et.al. 2411.03709 null
2024-11-05 STEER: Flexible Robotic Manipulation via Dense Language Grounding Laura Smith et.al. 2411.03409 null
2024-11-05 Grounding Natural Language to SQL Translation with Data-Based Self-Explanations Yuankai Fan et.al. 2411.02948 link
2024-11-04 Grounding Emotional Descriptions to Electrovibration Haptic Signals Guimin Hu et.al. 2411.02118 null
2024-11-03 Classifier-guided Gradient Modulation for Enhanced Multimodal Learning Zirun Guo et.al. 2411.01409 link
2024-11-01 Text2Freq: Learning Series Patterns from Text via Frequency Domain Ming-Chih Lo et.al. 2411.00929 null
2024-10-29 EEG-based Multimodal Representation Learning for Emotion Recognition Kang Yin et.al. 2411.00822 null
2024-11-01 Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective Carlotta Langer et.al. 2411.00522 null
2024-10-30 PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation Ryozo Masukawa et.al. 2410.22623 null
2024-10-28 IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks Manjunath D et.al. 2410.20953 link
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702 null
2024-10-24 UGotMe: An Embodied System for Affective Human-Robot Interaction Peizhen Li et.al. 2410.18373 link
2024-10-22 EVC-MF: End-to-end Video Captioning Network with Multi-scale Features Tian-Zi Niu et.al. 2410.16624 null
2024-10-22 MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report Samrajya Thapa et.al. 2410.16239 link
2024-10-21 Multimodal Learning for Embryo Viability Prediction in Clinical IVF Junsik Kim et.al. 2410.15581 null
2024-10-20 Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison Shiyu Hu et.al. 2410.15270 null
2024-10-15 CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning Qingqing Cao et.al. 2410.11963 null
2024-10-15 Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers Davide Celestini et.al. 2410.11723 null
2024-10-15 On-the-fly Modulation for Balanced Multimodal Learning Yake Wei et.al. 2410.11582 link
2024-10-14 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Peng Xia et.al. 2410.10139 link
2024-10-10 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 link
2024-10-11 Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Changli Tang et.al. 2410.06682 null
2024-10-08 Multimodal Representation Learning using Adaptive Graph Construction Weichen Huang et.al. 2410.06395 null
2024-10-07 Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models Dehong Kong et.al. 2410.04884 null
2024-10-07 MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection Niki Nezakati et.al. 2410.03010 null
2024-10-02 Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations Minoh Jeong et.al. 2410.02086 null
2024-10-02 Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark Zheng Lian et.al. 2410.01495 null
2024-10-04 VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Jiapeng Wang et.al. 2410.00741 null
2024-09-30 Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning Weitai Kang et.al. 2410.00255 link
2024-09-30 Towards Robust Multimodal Sentiment Analysis with Incomplete Data Haoyu Zhang et.al. 2409.20012 link
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 link
2024-09-26 Infer Human’s Intentions Before Following Natural Language Instructions Yanming Wan et.al. 2409.18073 link
2024-09-26 A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios Christian Ganhör et.al. 2409.17864 null
2024-09-26 Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification Raja Kumar et.al. 2409.17777 null
2024-09-25 Language Grounded Multi-agent Communication for Ad-hoc Teamwork Huao Li et.al. 2409.17348 null
2024-09-24 CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation Fuxian Huang et.al. 2409.15806 null
2024-09-18 All-in-one foundational models learning across quantum chemical levels Yuxinxin Chen et.al. 2409.12015 link
2024-09-13 Hierarchical Hypercomplex Network for Multimodal Emotion Recognition Eleonora Lopez et.al. 2409.09194 link
2024-09-13 Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing Minh-Duc Vu et.al. 2409.08885 null
2024-09-13 A Multimodal Approach for Fluid Overload Prediction: Integrating Lung Ultrasound and Clinical Data Tianqi Yang et.al. 2409.08790 null
2024-09-13 A Comprehensive Survey on Deep Multimodal Learning with Missing Modality Renjie Wu et.al. 2409.07825 null
2024-09-11 What to align in multimodal contrastive learning? Benoit Dufumier et.al. 2409.07402 null
2024-09-11 Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective Guimin Hu et.al. 2409.07388 link
2024-09-11 Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout Anbin QI et.al. 2409.07078 null
2024-09-11 A Survey of Multimodal Composite Editing and Retrieval Suyan Li et.al. 2409.05405 link
2024-09-09 Diagnostic Reasoning in Natural Language: Computational Model and Application Nils Dycke et.al. 2409.05367 null
2024-09-10 Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment Zhixian Zhao et.al. 2409.05015 null
2024-08-31 Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification Aref Farhadipour et.al. 2409.00562 null
2024-08-29 Toward Robust Early Detection of Alzheimer’s Disease via an Integrated Multimodal Learning Approach Yifei Chen et.al. 2408.16343 link
2024-08-28 Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis Sijie Mai et.al. 2408.16029 null
2024-08-28 ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation Tiantian Feng et.al. 2408.15803 null
2024-08-28 Visual Prompt Engineering for Medical Vision Language Models in Radiology Stefan Denner et.al. 2408.15802 null
2024-08-27 The Benefits of Balance: From Information Projections to Variance Reduction Lang Liu et.al. 2408.15065 null
2024-08-27 NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework Shuangchen Zhao et.al. 2408.14950 null
2024-09-03 Foundation Models for Music: A Survey Yinghao Ma et.al. 2408.14340 link
2024-09-06 Quantum Multimodal Contrastive Learning Framework Chi-Sheng Chen et.al. 2408.13919 null
2024-08-25 Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples Jayakanth Kunhoth et.al. 2408.13754 null
2024-08-24 R2G: Reasoning to Ground in 3D Scenes Yixuan Li et.al. 2408.13499 null
2024-08-23 Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition Cam-Van Thi Nguyen et.al. 2408.12895 null
2024-08-23 Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey Qika Lin et.al. 2408.12880 link
2024-08-23 Grounding Fallacies Misrepresenting Scientific Publications in Evidence Max Glockner et.al. 2408.12812 null
2024-08-22 Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models Jean Park et.al. 2408.12763 null
2024-08-22 Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment Jinghui Qin et.al. 2408.12088 null
2024-08-22 Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model Mengying Ge et.al. 2408.11286 null
2024-08-21 SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition Zebang Cheng et.al. 2408.10500 link
2024-08-19 Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation Liu He et.al. 2408.10453 null
2024-08-18 Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition Qifei Li et.al. 2408.09438 link
2024-08-16 Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition Muhammad Haseeb Aslam et.al. 2408.09035 link
2024-08-14 Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach Muhammad Saad Saeed et.al. 2408.07445 null
2024-08-14 Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration Xiaogen Zhon et.al. 2408.07341 link
2024-08-14 Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion Peiyuan Chen et.al. 2408.07303 null
2024-08-13 Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning Jieming Bian et.al. 2408.06549 null
2024-08-04 Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion Shaoxu Cheng et.al. 2408.02695 null
2024-08-06 Infusing Environmental Captions for Long-Form Video Language Grounding Hyogun Lee et.al. 2408.02336 null
2024-08-05 REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Agneet Chatterjee et.al. 2408.02231 null
2024-08-04 CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization Xiang He et.al. 2408.01952 link
2024-08-02 Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation Zijian Yi et.al. 2408.00970 link
2024-08-01 The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement Thales Bertaglia et.al. 2408.00534 null
2024-07-31 Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition Jiang Li et.al. 2407.21536 null
2024-07-31 DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations Dongwon Son et.al. 2407.21267 null
2024-07-30 HyperMM : Robust Multimodal Learning with Varying-sized Inputs Hava Chaptoukaev et.al. 2407.20768 null
2024-07-29 ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 Wenjun Huang et.al. 2407.19832 null
2024-08-02 XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training Biao Wu et.al. 2407.19546 link
2024-07-28 Detached and Interactive Multimodal Learning Yunfeng Fan et.al. 2407.19514 link
2024-07-26 Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment Yuze Zheng et.al. 2407.18854 null
2024-07-26 Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention Joe Dhanith P R et.al. 2407.18552 null
2024-07-25 $\mathbb{X}$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs Vlad Sobal et.al. 2407.18134 null
2024-07-25 Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis Jatin Chaudhary et.al. 2407.18060 null
2024-07-23 Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation Tao Meng et.al. 2407.16714 null
2024-07-24 MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues Liyun Zhang et.al. 2407.16552 null
2024-07-23 Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities Muhammad Irzam Liaqat et.al. 2407.16243 null
2024-07-22 Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training Ye Lin Tun et.al. 2407.15426 null
2024-07-17 Text- and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild Nicolas Richet et.al. 2407.12927 link
2024-07-17 Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models Donggeun Kim et.al. 2407.12616 null
2024-07-12 Diagnosing and Re-learning for Balanced Multimodal Learning Yake Wei et.al. 2407.09705 link
2024-07-12 Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework Haoqin Sun et.al. 2407.09029 null
2024-07-10 AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition Zheng Lian et.al. 2407.07653 link
2024-07-06 Completed Feature Disentanglement Learning for Multimodal MRIs Analysis Tianling Liu et.al. 2407.04916 null
2024-07-05 Multimodal Classification via Modal-Aware Interactive Enhancement Qing-Yuan Jiang et.al. 2407.04587 null
2024-07-05 Robust Multimodal Learning via Representation Decoupling Shicai Wei et.al. 2407.04458 null
2024-07-05 Smart Vision-Language Reasoners Denisa Roberts et.al. 2407.04212 link
2024-07-04 ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities Julie Mordacq et.al. 2407.03836 link
2024-07-02 Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties Srivathsan Badrinarayanan et.al. 2407.03380 link
2024-07-05 Multi-Task Domain Adaptation for Language Grounding with 3D Objects Penglei Sun et.al. 2407.02846 null
2024-07-01 Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation Sirui Xia et.al. 2407.01796 null
2024-06-30 Tarsier: Recipes for Training and Evaluating Large Video Description Models Jiawei Wang et.al. 2407.00634 link
2024-06-28 Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction Akash Awasthi et.al. 2407.00129 null
2024-06-27 From Efficient Multimodal Models to World Models: A Survey Xinji Mai et.al. 2407.00118 null
2024-06-27 Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment Hao Fei et.al. 2406.19255 null
2024-06-27 RAVEN: Multitask Retrieval Augmented Vision-Language Learning Varun Nagaraj Rao et.al. 2406.19150 null
2024-06-26 Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs Uttaran Bhattacharya et.al. 2406.18068 null
2024-06-25 Data curation via joint example selection further accelerates multimodal learning Talfan Evans et.al. 2406.17711 null
2024-06-23 LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control Delin Qu et.al. 2406.16038 null
2024-06-20 Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning Yupei Zhang et.al. 2406.13979 link
2024-06-19 VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models Haowen Hou et.al. 2406.13362 link
2024-06-18 Language and Multimodal Models in Sports: A Survey of Datasets and Applications Haotian Xia et.al. 2406.12252 null
2024-07-01 Multimodal Learning With Intraoperative CBCT & Variably Aligned Preoperative CT Data To Improve Segmentation Maximilian E. Tschuchnig et.al. 2406.11650 null
2024-06-17 Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective Yang Chen et.al. 2406.11249 null
2024-06-17 Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning Zebang Cheng et.al. 2406.11161 link
2024-06-13 Explore the Limits of Omni-modal Pretraining at Scale Yiyuan Zhang et.al. 2406.09412 link
2024-06-13 OpenVLA: An Open-Source Vision-Language-Action Model Moo Jin Kim et.al. 2406.09246 link
2024-06-13 Zoom and Shift are All You Need Jiahao Qin et.al. 2406.08866 null
2024-06-11 Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes Asim Waqas et.al. 2406.08521 null
2024-06-16 A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles Nirmalya Thakur et.al. 2406.07693 null
2024-06-11 Situational Awareness Matters in 3D Vision Language Reasoning Yunze Man et.al. 2406.07544 link
2024-06-11 Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology Huahui Yi et.al. 2406.07078 link
2024-06-10 NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Asmar Nadeem et.al. 2406.06499 null
2024-06-10 Vript: A Video Is Worth Thousands of Words Dongjie Yang et.al. 2406.06040 link
2024-06-09 Stealthy Targeted Backdoor Attacks against Image Captioning Wenshu Fan et.al. 2406.05874 null
2024-06-07 Predictive Dynamic Fusion Bing Cao et.al. 2406.04802 link
2024-06-07 AICoderEval: Improving AI Domain Code Generation of Large Language Models Yinghui Xia et.al. 2406.04712 null
2024-06-02 Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications David Restrepo et.al. 2406.02601 null
2024-06-04 Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization Yunpeng Zhao et.al. 2406.01987 null
2024-06-03 Automatic Fused Multimodal Deep Learning for Plant Identification Alfreds Lapkovskis et.al. 2406.01455 link
2024-06-05 Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data Zhusi Zhong et.al. 2406.01302 null
2024-06-02 Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient Zechu Li et.al. 2406.00681 null
2024-05-31 Ovis: Structural Embedding Alignment for Multimodal Large Language Model Shiyin Lu et.al. 2405.20797 null
2024-05-31 Visual Attention Analysis in Online Learning Miriam Navarro et.al. 2405.20091 null
2024-05-29 Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining Blake R. Duschatko et.al. 2405.19386 null
2024-05-29 LLMs Meet Multimodal Generation and Editing: A Survey Yingqing He et.al. 2405.19334 link
2024-05-29 Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches A. Hammad et.al. 2405.18834 null
2024-05-28 RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives Jaehong Yoon et.al. 2405.18406 link
2024-05-28 MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance Yake Wei et.al. 2405.17730 link
2024-05-27 Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning Zihua Zhao et.al. 2405.16996 null
2024-05-27 Multilingual Diversity Improves Vision-Language Representations Thao Nguyen et.al. 2405.16915 null
2024-05-27 Hawk: Learning to Understand Open-World Video Anomalies Jiaqi Tang et.al. 2405.16886 link
2024-05-24 Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search Marie Al Ghossein et.al. 2405.15190 link
2024-05-23 TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing Teng Xu et.al. 2405.14455 null
2024-05-22 Grounding Toxicity in Real-World Events across Languages Wondimagegnhue Tsegaye Tufa et.al. 2405.13754 link
2024-05-21 A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings Vanya Cohen et.al. 2405.13245 null
2024-05-21 Inconsistency-Aware Cross-Attention for Audio-Visual Fusion in Dimensional Emotion Recognition R Gnana Praveen et.al. 2405.12853 null
2024-05-21 Scientific discourse on YouTube: Motivations for citing research in comments Sören Striewski et.al. 2405.12798 null
2024-05-21 Amplifying Academic Research through YouTube: Engagement Metrics as Predictors of Citation Impact Olga Zagovora et.al. 2405.12734 null
2024-05-21 A Multimodal Learning-based Approach for Autonomous Landing of UAV Francisco Neves et.al. 2405.12681 null
2024-05-21 Mutual Information Analysis in Multimodal Learning Systems Hadi Hadizadeh et.al. 2405.12456 null
2024-05-16 Grounded 3D-LLM with Referent Tokens Yilun Chen et.al. 2405.10370 link
2024-05-13 Improving Multimodal Learning with Multi-Loss Gradient Modulation Konstantinos Kontras et.al. 2405.07930 link
2024-05-13 Generating Human Motion in 3D Scenes from Text Descriptions Zhi Cen et.al. 2405.07784 null
2024-05-13 An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention Junichiro Niimi et.al. 2405.07435 null
2024-05-10 A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments Joyce Fonteles et.al. 2405.06203 null
2024-05-09 Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training Sheng Yan et.al. 2405.05523 null
2024-05-08 Empathy Through Multimodality in Conversational Interfaces Mahyar Abbasian et.al. 2405.04777 null
2024-05-08 All in One Framework for Multimodal Re-identification in the Wild He Li et.al. 2405.04741 null
2024-05-07 Interpretable Tensor Fusion Saurabh Varshneya et.al. 2405.04671 null
2024-04-27 MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning Nadia Saeed et.al. 2405.01583 null
2024-04-29 3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset Xinyu Ma et.al. 2404.18413 link
2024-04-28 LEGENT: Open Platform for Embodied Agents Zhili Cheng et.al. 2404.18243 null
2024-05-03 Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum Tao Meng et.al. 2404.17862 null
2024-04-29 MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition Zheng Lian et.al. 2404.17113 link
2024-04-30 AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models Zhiqiang Tang et.al. 2404.16233 null
2024-04-23 Hidden in Plain Sight: Exploring the Intersections of Mental Health, Eating Disorders, and Content Moderation on TikTok Charles Bickham et.al. 2404.15457 null
2024-04-14 A Survey on Multimodal Wearable Sensor-based Human Action Recognition Jianyuan Ni et.al. 2404.15349 null
2024-04-23 Between Flat-Earthers and Fitness Coaches: Who is Citing Scientific Publications in YouTube Video Descriptions? Olga Zagovora et.al. 2404.15083 null
2024-04-19 Cooperative Sentiment Agents for Multimodal Sentiment Analysis Shanmin Wang et.al. 2404.12642 link
2024-04-18 Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities Luciana Trinkaus Menon et.al. 2404.12251 null
2024-04-19 TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content Avinash Anand et.al. 2404.10305 null
2024-04-15 AIGeN: An Adversarial Approach for Instruction Generation in VLN Niyati Rawal et.al. 2404.10054 null
2024-04-22 Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning Xiongye Xiao et.al. 2404.09403 link
2024-04-14 TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning Quang Minh Dinh et.al. 2404.09275 link
2024-04-13 MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild Kateryna Chumachenko et.al. 2404.09010 link
2024-04-12 OmniSat: Self-Supervised Modality Fusion for Earth Observation Guillaume Astruc et.al. 2404.08351 link
2024-04-11 Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios Yuan Zhang et.al. 2404.07484 null
2024-04-07 X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model Jan Held et.al. 2404.06332 null
2024-04-07 A Data-to-Product Multimodal Conceptual Framework to Achieve Automated Software Evolution for Context-rich Intelligent Applications Songhui Yue et.al. 2404.04821 null
2024-04-06 Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment Prasun C Tripathi et.al. 2404.04718 link
2024-04-05 Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training Zitao Shuai et.al. 2404.03854 null
2024-04-02 On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning Ari Karchmer et.al. 2404.02254 null
2024-04-01 iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer Fengtao Zhou et.al. 2404.01192 link
2024-04-11 MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models Zebang Cheng et.al. 2404.00511 link
2024-03-30 UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause Guimin Hu et.al. 2404.00403 null
2024-03-28 IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation Jiacui Huang et.al. 2403.19336 null
2024-03-26 Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation Abdelrhman Werby et.al. 2403.17846 null
2024-03-26 Project MOSLA: Recording Every Moment of Second Language Acquisition Masato Hagiwara et.al. 2403.17314 null
2024-03-17 A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition Abhi Kamboj et.al. 2403.15444 null
2024-03-22 Contrastive Learning on Multimodal Analysis of Electronic Health Records Tianxi Cai et.al. 2403.14926 null
2024-03-20 Grounding Spatial Relations in Text-Only Language Models Gorka Azkune et.al. 2403.13666 link
2024-04-02 Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition R. Gnana Praveen et.al. 2403.13659 null
2024-03-20 VL-Mamba: Exploring State Space Models for Multimodal Learning Yanyuan Qiao et.al. 2403.13600 null
2024-03-17 From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting Zhen Zeng et.al. 2403.11047 null
2024-03-26 Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity Zhuo Zhi et.al. 2403.09428 link
2024-03-14 Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation Daniel Honerkamp et.al. 2403.08605 link
2024-03-12 A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection Morteza Bodaghi et.al. 2403.08077 null
2024-03-10 WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs Deshun Yang et.al. 2403.07944 null
2024-03-25 FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks Muhammad Saif Ullah Khan et.al. 2403.06904 null
2024-03-11 DiaLoc: An Iterative Approach to Embodied Dialog Localization Chao Zhang et.al. 2403.06846 null
2024-03-11 Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement Che Liu et.al. 2403.06659 link
2024-03-07 A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data Marco D Alessandro et.al. 2403.04866 link
2024-03-05 JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models Arefa et.al. 2403.04798 link
2024-03-07 CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? Ibrahim Alabdulmohsin et.al. 2403.04547 null
2024-03-04 Reactive Programming without Functions Bjarno Oeyen et.al. 2403.02296 null
2024-03-03 Hyperspectral Image Analysis in Single-Modal and Multimodal setting using Deep Learning Techniques Shivam Pande et.al. 2403.01546 null
2024-03-02 ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Moran Yanuka et.al. 2403.01306 link
2024-03-02 Adversarial Testing for Visual Grounding via Image-Aware Property Reduction Zhiyuan Chang et.al. 2403.01118 null
2024-02-29 Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Tsai-Shien Chen et.al. 2402.19479 null
2024-02-29 FATE in MMLA: A Student-Centred Exploration of Fairness, Accountability, Transparency, and Ethics in Multimodal Learning Analytics Yueqiao Jin et.al. 2402.19071 null
2024-02-28 Grounding Language Models for Visual Entity Recognition Zilin Xiao et.al. 2402.18695 link
2024-02-28 Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR Images Jiarui Xing et.al. 2402.18507 null
2024-02-28 DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning Jianxiong Li et.al. 2402.18137 null
2024-02-27 Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control Thong Nguyen et.al. 2402.17535 link
2024-02-27 Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition Cam-Van Thi Nguyen et.al. 2402.17269 null
2024-02-26 GROUNDHOG: Grounding Large Language Models to Holistic Segmentation Yichi Zhang et.al. 2402.16846 null
2024-02-26 Gradient-Guided Modality Decoupling for Missing-Modality Robustness Hao Wang et.al. 2402.16318 null
2024-02-24 FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology Yuanzhe Peng et.al. 2402.15858 null
2024-02-20 GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models Sayantan Adak et.al. 2402.12881 link
2024-02-19 Multimodal Emotion Recognition from Raw Audio with Sinc-convolution Xiaohui Zhang et.al. 2402.11954 null
2024-02-18 Efficient Multimodal Learning from Data-centric Perspective Muyang He et.al. 2402.11530 link

(<a href=../README.md>back to main</a>)