Multimodal - 2024-08
Multimodal - 2024-08
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-08-31 | Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification | Aref Farhadipour et.al. | 2409.00562 | translate | read | null |
| 2024-08-29 | Toward Robust Early Detection of Alzheimer’s Disease via an Integrated Multimodal Learning Approach | Yifei Chen et.al. | 2408.16343 | translate | read | link |
| 2024-08-28 | Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis | Sijie Mai et.al. | 2408.16029 | translate | read | null |
| 2024-08-28 | ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation | Tiantian Feng et.al. | 2408.15803 | translate | read | null |
| 2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | translate | read | null |
| 2024-08-27 | The Benefits of Balance: From Information Projections to Variance Reduction | Lang Liu et.al. | 2408.15065 | translate | read | null |
| 2024-08-27 | NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework | Shuangchen Zhao et.al. | 2408.14950 | translate | read | null |
| 2024-08-25 | Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples | Jayakanth Kunhoth et.al. | 2408.13754 | translate | read | null |
| 2024-08-24 | R2G: Reasoning to Ground in 3D Scenes | Yixuan Li et.al. | 2408.13499 | translate | read | null |
| 2024-08-23 | Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition | Cam-Van Thi Nguyen et.al. | 2408.12895 | translate | read | null |
| 2024-08-23 | Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey | Qika Lin et.al. | 2408.12880 | translate | read | link |
| 2024-08-23 | Grounding Fallacies Misrepresenting Scientific Publications in Evidence | Max Glockner et.al. | 2408.12812 | translate | read | null |
| 2024-08-22 | Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models | Jean Park et.al. | 2408.12763 | translate | read | null |
| 2024-08-22 | Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment | Jinghui Qin et.al. | 2408.12088 | translate | read | null |
| 2024-08-22 | Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Mengying Ge et.al. | 2408.11286 | translate | read | null |
| 2024-08-21 | SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Zebang Cheng et.al. | 2408.10500 | translate | read | link |
| 2024-08-19 | Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation | Liu He et.al. | 2408.10453 | translate | read | null |
| 2024-08-18 | Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition | Qifei Li et.al. | 2408.09438 | translate | read | link |
| 2024-08-16 | Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition | Muhammad Haseeb Aslam et.al. | 2408.09035 | translate | read | link |
| 2024-08-14 | Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach | Muhammad Saad Saeed et.al. | 2408.07445 | translate | read | null |
| 2024-08-14 | Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration | Xiaogen Zhon et.al. | 2408.07341 | translate | read | link |
| 2024-08-14 | Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Peiyuan Chen et.al. | 2408.07303 | translate | read | null |
| 2024-08-13 | Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning | Jieming Bian et.al. | 2408.06549 | translate | read | null |
| 2024-08-04 | Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion | Shaoxu Cheng et.al. | 2408.02695 | translate | read | null |
| 2024-08-06 | Infusing Environmental Captions for Long-Form Video Language Grounding | Hyogun Lee et.al. | 2408.02336 | translate | read | null |
| 2024-08-05 | REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Agneet Chatterjee et.al. | 2408.02231 | translate | read | null |
| 2024-08-04 | CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization | Xiang He et.al. | 2408.01952 | translate | read | link |
| 2024-08-02 | Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation | Zijian Yi et.al. | 2408.00970 | translate | read | link |
| 2024-08-01 | The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement | Thales Bertaglia et.al. | 2408.00534 | translate | read | null |
| 2024-08-02 | XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training | Biao Wu et.al. | 2407.19546 | translate | read | link |
(<a href=../Multimodal.md>back to Multimodal</a>)