Multimodal - 2024-12
Multimodal - 2024-12
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-12-30 | Aviary: training language agents on challenging scientific tasks | Siddharth Narayanan et.al. | 2412.21154 | translate | read | null |
| 2024-12-30 | Hierarchical Banzhaf Interaction for General Video-Language Representation Learning | Peng Jin et.al. | 2412.20964 | translate | read | link |
| 2024-12-30 | Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment | Xuechen Wang et.al. | 2412.20821 | translate | read | null |
| 2024-12-29 | Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment | Shiyun Chen et.al. | 2412.20418 | translate | read | null |
| 2024-12-26 | Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching | Wenjing Chen et.al. | 2412.19184 | translate | read | null |
| 2024-12-26 | CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting | Siyu Jiao et.al. | 2412.19142 | translate | read | null |
| 2024-12-24 | MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning | Abdelmadjid Chergui et.al. | 2412.18437 | translate | read | link |
| 2024-12-23 | Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion | Grigor Bezirganyan et.al. | 2412.18024 | translate | read | link |
| 2024-12-23 | A Multimodal Emotion Recognition System: Integrating Facial Expressions, Body Movement, Speech, and Spoken Language | Kris Kraack et.al. | 2412.17907 | translate | read | null |
| 2024-12-18 | Constraint-Based Model in Multimodal Learning to Improve Ventricular Arrhythmia Prediction | Evariste Njomgue Fotso et.al. | 2412.17840 | translate | read | null |
| 2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759 | translate | read | null |
| 2024-12-23 | EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities | Zhe Chen et.al. | 2412.17677 | translate | read | link |
| 2024-12-23 | V $^2$ -SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy | Long Bai et.al. | 2412.17595 | translate | read | null |
| 2024-12-22 | COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations | Vanessa Su et.al. | 2412.17180 | translate | read | null |
| 2024-12-17 | DoPTA: Improving Document Layout Analysis using Patch-Text Alignment | Nikitha SR et.al. | 2412.12902 | translate | read | null |
| 2024-12-17 | Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning | Shiping Ge et.al. | 2412.12791 | translate | read | link |
| 2024-12-17 | PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution | Yuhyun Kim et.al. | 2412.12565 | translate | read | null |
| 2024-12-16 | Gramian Multimodal Representation Learning and Alignment | Giordano Cicchetti et.al. | 2412.11959 | translate | read | null |
| 2024-12-10 | Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning | Can Yaras et.al. | 2412.07909 | translate | read | null |
| 2024-12-07 | WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition | Feng Li et.al. | 2412.05558 | translate | read | null |
| 2024-12-05 | Lattice Lingo: Effect of Textual Detail on Multimodal Learning for Property Prediction of Crystals | Mrigi Munjal et.al. | 2412.04670 | translate | read | null |
| 2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | translate | read | null |
| 2024-12-04 | Grounded Language Design for Lightweight Diagramming for Formal Methods | Siddhartha Prasad et.al. | 2412.03310 | translate | read | null |
| 2024-12-04 | Dynamic Graph Neural Ordinary Differential Equation Network for Multi-modal Emotion Recognition in Conversation | Yuntao Shou et.al. | 2412.02935 | translate | read | null |
| 2024-12-03 | Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data | Maximilian E. Tschuchnig et.al. | 2412.02294 | translate | read | null |
| 2024-12-02 | Occam’s LGS: A Simple Approach for Language Gaussian Splatting | Jiahuan Cheng et.al. | 2412.01807 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)