Multimodal - 2024-08

Publish Date Title Authors PDF Translate Read Code
2024-08-31 Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification Aref Farhadipour et.al. 2409.00562 translate read null
2024-08-29 Toward Robust Early Detection of Alzheimer’s Disease via an Integrated Multimodal Learning Approach Yifei Chen et.al. 2408.16343 translate read link
2024-08-28 Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis Sijie Mai et.al. 2408.16029 translate read null
2024-08-28 ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation Tiantian Feng et.al. 2408.15803 translate read null
2024-08-28 Visual Prompt Engineering for Medical Vision Language Models in Radiology Stefan Denner et.al. 2408.15802 translate read null
2024-08-27 The Benefits of Balance: From Information Projections to Variance Reduction Lang Liu et.al. 2408.15065 translate read null
2024-08-27 NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework Shuangchen Zhao et.al. 2408.14950 translate read null
2024-08-25 Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples Jayakanth Kunhoth et.al. 2408.13754 translate read null
2024-08-24 R2G: Reasoning to Ground in 3D Scenes Yixuan Li et.al. 2408.13499 translate read null
2024-08-23 Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition Cam-Van Thi Nguyen et.al. 2408.12895 translate read null
2024-08-23 Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey Qika Lin et.al. 2408.12880 translate read link
2024-08-23 Grounding Fallacies Misrepresenting Scientific Publications in Evidence Max Glockner et.al. 2408.12812 translate read null
2024-08-22 Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models Jean Park et.al. 2408.12763 translate read null
2024-08-22 Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment Jinghui Qin et.al. 2408.12088 translate read null
2024-08-22 Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model Mengying Ge et.al. 2408.11286 translate read null
2024-08-21 SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition Zebang Cheng et.al. 2408.10500 translate read link
2024-08-19 Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation Liu He et.al. 2408.10453 translate read null
2024-08-18 Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition Qifei Li et.al. 2408.09438 translate read link
2024-08-16 Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition Muhammad Haseeb Aslam et.al. 2408.09035 translate read link
2024-08-14 Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach Muhammad Saad Saeed et.al. 2408.07445 translate read null
2024-08-14 Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration Xiaogen Zhon et.al. 2408.07341 translate read link
2024-08-14 Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion Peiyuan Chen et.al. 2408.07303 translate read null
2024-08-13 Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning Jieming Bian et.al. 2408.06549 translate read null
2024-08-04 Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion Shaoxu Cheng et.al. 2408.02695 translate read null
2024-08-06 Infusing Environmental Captions for Long-Form Video Language Grounding Hyogun Lee et.al. 2408.02336 translate read null
2024-08-05 REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Agneet Chatterjee et.al. 2408.02231 translate read null
2024-08-04 CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization Xiang He et.al. 2408.01952 translate read link
2024-08-02 Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation Zijian Yi et.al. 2408.00970 translate read link
2024-08-01 The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement Thales Bertaglia et.al. 2408.00534 translate read null
2024-08-02 XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training Biao Wu et.al. 2407.19546 translate read link

(<a href=../Multimodal.md>back to Multimodal</a>)