Multimodal - 2024-03

Publish Date Title Authors PDF Translate Read Code
2024-03-30 UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause Guimin Hu et.al. 2404.00403 translate read null
2024-03-28 IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation Jiacui Huang et.al. 2403.19336 translate read null
2024-03-26 Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation Abdelrhman Werby et.al. 2403.17846 translate read null
2024-03-26 Project MOSLA: Recording Every Moment of Second Language Acquisition Masato Hagiwara et.al. 2403.17314 translate read null
2024-03-17 A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition Abhi Kamboj et.al. 2403.15444 translate read null
2024-03-22 Contrastive Learning on Multimodal Analysis of Electronic Health Records Tianxi Cai et.al. 2403.14926 translate read null
2024-03-20 Grounding Spatial Relations in Text-Only Language Models Gorka Azkune et.al. 2403.13666 translate read link
2024-03-20 VL-Mamba: Exploring State Space Models for Multimodal Learning Yanyuan Qiao et.al. 2403.13600 translate read null
2024-03-17 From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting Zhen Zeng et.al. 2403.11047 translate read null
2024-03-26 Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity Zhuo Zhi et.al. 2403.09428 translate read link
2024-03-14 Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation Daniel Honerkamp et.al. 2403.08605 translate read link
2024-03-12 A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection Morteza Bodaghi et.al. 2403.08077 translate read null
2024-03-10 WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs Deshun Yang et.al. 2403.07944 translate read null
2024-03-25 FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks Muhammad Saif Ullah Khan et.al. 2403.06904 translate read null
2024-03-11 DiaLoc: An Iterative Approach to Embodied Dialog Localization Chao Zhang et.al. 2403.06846 translate read null
2024-03-11 Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement Che Liu et.al. 2403.06659 translate read link
2024-03-07 A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data Marco D Alessandro et.al. 2403.04866 translate read link
2024-03-05 JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models Arefa et.al. 2403.04798 translate read link
2024-03-07 CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? Ibrahim Alabdulmohsin et.al. 2403.04547 translate read null
2024-03-04 Reactive Programming without Functions Bjarno Oeyen et.al. 2403.02296 translate read null
2024-03-03 Hyperspectral Image Analysis in Single-Modal and Multimodal setting using Deep Learning Techniques Shivam Pande et.al. 2403.01546 translate read null
2024-03-02 ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Moran Yanuka et.al. 2403.01306 translate read link
2024-03-02 Adversarial Testing for Visual Grounding via Image-Aware Property Reduction Zhiyuan Chang et.al. 2403.01118 translate read null

(<a href=../Multimodal.md>back to Multimodal</a>)