Multimodal - 2024-06

Publish Date Title Authors PDF Translate Read Code
2024-06-30 Tarsier: Recipes for Training and Evaluating Large Video Description Models Jiawei Wang et.al. 2407.00634 translate read link
2024-06-28 Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction Akash Awasthi et.al. 2407.00129 translate read null
2024-06-27 From Efficient Multimodal Models to World Models: A Survey Xinji Mai et.al. 2407.00118 translate read null
2024-06-27 Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment Hao Fei et.al. 2406.19255 translate read null
2024-06-27 RAVEN: Multitask Retrieval Augmented Vision-Language Learning Varun Nagaraj Rao et.al. 2406.19150 translate read null
2024-06-26 Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs Uttaran Bhattacharya et.al. 2406.18068 translate read null
2024-06-25 Data curation via joint example selection further accelerates multimodal learning Talfan Evans et.al. 2406.17711 translate read null
2024-06-23 LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control Delin Qu et.al. 2406.16038 translate read null
2024-06-20 Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning Yupei Zhang et.al. 2406.13979 translate read link
2024-06-19 VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models Haowen Hou et.al. 2406.13362 translate read link
2024-06-18 Language and Multimodal Models in Sports: A Survey of Datasets and Applications Haotian Xia et.al. 2406.12252 translate read null
2024-06-17 Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective Yang Chen et.al. 2406.11249 translate read null
2024-06-17 Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning Zebang Cheng et.al. 2406.11161 translate read link
2024-06-13 Explore the Limits of Omni-modal Pretraining at Scale Yiyuan Zhang et.al. 2406.09412 translate read link
2024-06-13 OpenVLA: An Open-Source Vision-Language-Action Model Moo Jin Kim et.al. 2406.09246 translate read link
2024-06-13 Zoom and Shift are All You Need Jiahao Qin et.al. 2406.08866 translate read null
2024-06-11 Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes Asim Waqas et.al. 2406.08521 translate read null
2024-06-16 A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles Nirmalya Thakur et.al. 2406.07693 translate read null
2024-06-11 Situational Awareness Matters in 3D Vision Language Reasoning Yunze Man et.al. 2406.07544 translate read link
2024-06-11 Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology Huahui Yi et.al. 2406.07078 translate read link
2024-06-10 NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Asmar Nadeem et.al. 2406.06499 translate read null
2024-06-10 Vript: A Video Is Worth Thousands of Words Dongjie Yang et.al. 2406.06040 translate read link
2024-06-09 Stealthy Targeted Backdoor Attacks against Image Captioning Wenshu Fan et.al. 2406.05874 translate read null
2024-06-07 Predictive Dynamic Fusion Bing Cao et.al. 2406.04802 translate read link
2024-06-07 AICoderEval: Improving AI Domain Code Generation of Large Language Models Yinghui Xia et.al. 2406.04712 translate read null
2024-06-02 Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications David Restrepo et.al. 2406.02601 translate read null
2024-06-04 Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization Yunpeng Zhao et.al. 2406.01987 translate read null
2024-06-03 Automatic Fused Multimodal Deep Learning for Plant Identification Alfreds Lapkovskis et.al. 2406.01455 translate read link
2024-06-05 Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data Zhusi Zhong et.al. 2406.01302 translate read null
2024-06-02 Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient Zechu Li et.al. 2406.00681 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)