Multimodal - 2024-06
Multimodal - 2024-06
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-06-30 | Tarsier: Recipes for Training and Evaluating Large Video Description Models | Jiawei Wang et.al. | 2407.00634 | translate | read | link |
| 2024-06-28 | Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction | Akash Awasthi et.al. | 2407.00129 | translate | read | null |
| 2024-06-27 | From Efficient Multimodal Models to World Models: A Survey | Xinji Mai et.al. | 2407.00118 | translate | read | null |
| 2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255 | translate | read | null |
| 2024-06-27 | RAVEN: Multitask Retrieval Augmented Vision-Language Learning | Varun Nagaraj Rao et.al. | 2406.19150 | translate | read | null |
| 2024-06-26 | Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs | Uttaran Bhattacharya et.al. | 2406.18068 | translate | read | null |
| 2024-06-25 | Data curation via joint example selection further accelerates multimodal learning | Talfan Evans et.al. | 2406.17711 | translate | read | null |
| 2024-06-23 | LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control | Delin Qu et.al. | 2406.16038 | translate | read | null |
| 2024-06-20 | Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning | Yupei Zhang et.al. | 2406.13979 | translate | read | link |
| 2024-06-19 | VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models | Haowen Hou et.al. | 2406.13362 | translate | read | link |
| 2024-06-18 | Language and Multimodal Models in Sports: A Survey of Datasets and Applications | Haotian Xia et.al. | 2406.12252 | translate | read | null |
| 2024-06-17 | Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective | Yang Chen et.al. | 2406.11249 | translate | read | null |
| 2024-06-17 | Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning | Zebang Cheng et.al. | 2406.11161 | translate | read | link |
| 2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | translate | read | link |
| 2024-06-13 | OpenVLA: An Open-Source Vision-Language-Action Model | Moo Jin Kim et.al. | 2406.09246 | translate | read | link |
| 2024-06-13 | Zoom and Shift are All You Need | Jiahao Qin et.al. | 2406.08866 | translate | read | null |
| 2024-06-11 | Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes | Asim Waqas et.al. | 2406.08521 | translate | read | null |
| 2024-06-16 | A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles | Nirmalya Thakur et.al. | 2406.07693 | translate | read | null |
| 2024-06-11 | Situational Awareness Matters in 3D Vision Language Reasoning | Yunze Man et.al. | 2406.07544 | translate | read | link |
| 2024-06-11 | Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology | Huahui Yi et.al. | 2406.07078 | translate | read | link |
| 2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | translate | read | null |
| 2024-06-10 | Vript: A Video Is Worth Thousands of Words | Dongjie Yang et.al. | 2406.06040 | translate | read | link |
| 2024-06-09 | Stealthy Targeted Backdoor Attacks against Image Captioning | Wenshu Fan et.al. | 2406.05874 | translate | read | null |
| 2024-06-07 | Predictive Dynamic Fusion | Bing Cao et.al. | 2406.04802 | translate | read | link |
| 2024-06-07 | AICoderEval: Improving AI Domain Code Generation of Large Language Models | Yinghui Xia et.al. | 2406.04712 | translate | read | null |
| 2024-06-02 | Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications | David Restrepo et.al. | 2406.02601 | translate | read | null |
| 2024-06-04 | Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization | Yunpeng Zhao et.al. | 2406.01987 | translate | read | null |
| 2024-06-03 | Automatic Fused Multimodal Deep Learning for Plant Identification | Alfreds Lapkovskis et.al. | 2406.01455 | translate | read | link |
| 2024-06-05 | Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data | Zhusi Zhong et.al. | 2406.01302 | translate | read | null |
| 2024-06-02 | Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient | Zechu Li et.al. | 2406.00681 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)