Multimodal - 2024-12

Publish Date Title Authors PDF Translate Read Code
2024-12-30 Aviary: training language agents on challenging scientific tasks Siddharth Narayanan et.al. 2412.21154 translate read null
2024-12-30 Hierarchical Banzhaf Interaction for General Video-Language Representation Learning Peng Jin et.al. 2412.20964 translate read link
2024-12-30 Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment Xuechen Wang et.al. 2412.20821 translate read null
2024-12-29 Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment Shiyun Chen et.al. 2412.20418 translate read null
2024-12-26 Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching Wenjing Chen et.al. 2412.19184 translate read null
2024-12-26 CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting Siyu Jiao et.al. 2412.19142 translate read null
2024-12-24 MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning Abdelmadjid Chergui et.al. 2412.18437 translate read link
2024-12-23 Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion Grigor Bezirganyan et.al. 2412.18024 translate read link
2024-12-23 A Multimodal Emotion Recognition System: Integrating Facial Expressions, Body Movement, Speech, and Spoken Language Kris Kraack et.al. 2412.17907 translate read null
2024-12-18 Constraint-Based Model in Multimodal Learning to Improve Ventricular Arrhythmia Prediction Evariste Njomgue Fotso et.al. 2412.17840 translate read null
2024-12-23 Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Priyaranjan Pattnayak et.al. 2412.17759 translate read null
2024-12-23 EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities Zhe Chen et.al. 2412.17677 translate read link
2024-12-23 V $^2$ -SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy Long Bai et.al. 2412.17595 translate read null
2024-12-22 COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations Vanessa Su et.al. 2412.17180 translate read null
2024-12-17 DoPTA: Improving Document Layout Analysis using Patch-Text Alignment Nikitha SR et.al. 2412.12902 translate read null
2024-12-17 Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Shiping Ge et.al. 2412.12791 translate read link
2024-12-17 PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution Yuhyun Kim et.al. 2412.12565 translate read null
2024-12-16 Gramian Multimodal Representation Learning and Alignment Giordano Cicchetti et.al. 2412.11959 translate read null
2024-12-10 Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning Can Yaras et.al. 2412.07909 translate read null
2024-12-07 WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition Feng Li et.al. 2412.05558 translate read null
2024-12-05 Lattice Lingo: Effect of Textual Detail on Multimodal Learning for Property Prediction of Crystals Mrigi Munjal et.al. 2412.04670 translate read null
2024-12-04 Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning Neale Ratzlaff et.al. 2412.03467 translate read null
2024-12-04 Grounded Language Design for Lightweight Diagramming for Formal Methods Siddhartha Prasad et.al. 2412.03310 translate read null
2024-12-04 Dynamic Graph Neural Ordinary Differential Equation Network for Multi-modal Emotion Recognition in Conversation Yuntao Shou et.al. 2412.02935 translate read null
2024-12-03 Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data Maximilian E. Tschuchnig et.al. 2412.02294 translate read null
2024-12-02 Occam’s LGS: A Simple Approach for Language Gaussian Splatting Jiahuan Cheng et.al. 2412.01807 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)