Multimodal - 2024-06 | Paper Arxiv Daily

Multimodal - 2024-06

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-06-30	Tarsier: Recipes for Training and Evaluating Large Video Description Models	Jiawei Wang et.al.	2407.00634	translate	read	link
2024-06-28	Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction	Akash Awasthi et.al.	2407.00129	translate	read	null
2024-06-27	From Efficient Multimodal Models to World Models: A Survey	Xinji Mai et.al.	2407.00118	translate	read	null
2024-06-27	Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment	Hao Fei et.al.	2406.19255	translate	read	null
2024-06-27	RAVEN: Multitask Retrieval Augmented Vision-Language Learning	Varun Nagaraj Rao et.al.	2406.19150	translate	read	null
2024-06-26	Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs	Uttaran Bhattacharya et.al.	2406.18068	translate	read	null
2024-06-25	Data curation via joint example selection further accelerates multimodal learning	Talfan Evans et.al.	2406.17711	translate	read	null
2024-06-23	LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control	Delin Qu et.al.	2406.16038	translate	read	null
2024-06-20	Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning	Yupei Zhang et.al.	2406.13979	translate	read	link
2024-06-19	VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models	Haowen Hou et.al.	2406.13362	translate	read	link
2024-06-18	Language and Multimodal Models in Sports: A Survey of Datasets and Applications	Haotian Xia et.al.	2406.12252	translate	read	null
2024-06-17	Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective	Yang Chen et.al.	2406.11249	translate	read	null
2024-06-17	Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning	Zebang Cheng et.al.	2406.11161	translate	read	link
2024-06-13	Explore the Limits of Omni-modal Pretraining at Scale	Yiyuan Zhang et.al.	2406.09412	translate	read	link
2024-06-13	OpenVLA: An Open-Source Vision-Language-Action Model	Moo Jin Kim et.al.	2406.09246	translate	read	link
2024-06-13	Zoom and Shift are All You Need	Jiahao Qin et.al.	2406.08866	translate	read	null
2024-06-11	Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes	Asim Waqas et.al.	2406.08521	translate	read	null
2024-06-16	A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles	Nirmalya Thakur et.al.	2406.07693	translate	read	null
2024-06-11	Situational Awareness Matters in 3D Vision Language Reasoning	Yunze Man et.al.	2406.07544	translate	read	link
2024-06-11	Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology	Huahui Yi et.al.	2406.07078	translate	read	link
2024-06-10	NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative	Asmar Nadeem et.al.	2406.06499	translate	read	null
2024-06-10	Vript: A Video Is Worth Thousands of Words	Dongjie Yang et.al.	2406.06040	translate	read	link
2024-06-09	Stealthy Targeted Backdoor Attacks against Image Captioning	Wenshu Fan et.al.	2406.05874	translate	read	null
2024-06-07	Predictive Dynamic Fusion	Bing Cao et.al.	2406.04802	translate	read	link
2024-06-07	AICoderEval: Improving AI Domain Code Generation of Large Language Models	Yinghui Xia et.al.	2406.04712	translate	read	null
2024-06-02	Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications	David Restrepo et.al.	2406.02601	translate	read	null
2024-06-04	Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization	Yunpeng Zhao et.al.	2406.01987	translate	read	null
2024-06-03	Automatic Fused Multimodal Deep Learning for Plant Identification	Alfreds Lapkovskis et.al.	2406.01455	translate	read	link
2024-06-05	Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data	Zhusi Zhong et.al.	2406.01302	translate	read	null
2024-06-02	Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient	Zechu Li et.al.	2406.00681	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)