Multimodal - 2024-03
Multimodal - 2024-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-03-30 | UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause | Guimin Hu et.al. | 2404.00403 | translate | read | null |
| 2024-03-28 | IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation | Jiacui Huang et.al. | 2403.19336 | translate | read | null |
| 2024-03-26 | Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation | Abdelrhman Werby et.al. | 2403.17846 | translate | read | null |
| 2024-03-26 | Project MOSLA: Recording Every Moment of Second Language Acquisition | Masato Hagiwara et.al. | 2403.17314 | translate | read | null |
| 2024-03-17 | A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition | Abhi Kamboj et.al. | 2403.15444 | translate | read | null |
| 2024-03-22 | Contrastive Learning on Multimodal Analysis of Electronic Health Records | Tianxi Cai et.al. | 2403.14926 | translate | read | null |
| 2024-03-20 | Grounding Spatial Relations in Text-Only Language Models | Gorka Azkune et.al. | 2403.13666 | translate | read | link |
| 2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | Yanyuan Qiao et.al. | 2403.13600 | translate | read | null |
| 2024-03-17 | From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting | Zhen Zeng et.al. | 2403.11047 | translate | read | null |
| 2024-03-26 | Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity | Zhuo Zhi et.al. | 2403.09428 | translate | read | link |
| 2024-03-14 | Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation | Daniel Honerkamp et.al. | 2403.08605 | translate | read | link |
| 2024-03-12 | A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection | Morteza Bodaghi et.al. | 2403.08077 | translate | read | null |
| 2024-03-10 | WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs | Deshun Yang et.al. | 2403.07944 | translate | read | null |
| 2024-03-25 | FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks | Muhammad Saif Ullah Khan et.al. | 2403.06904 | translate | read | null |
| 2024-03-11 | DiaLoc: An Iterative Approach to Embodied Dialog Localization | Chao Zhang et.al. | 2403.06846 | translate | read | null |
| 2024-03-11 | Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement | Che Liu et.al. | 2403.06659 | translate | read | link |
| 2024-03-07 | A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data | Marco D Alessandro et.al. | 2403.04866 | translate | read | link |
| 2024-03-05 | JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models | Arefa et.al. | 2403.04798 | translate | read | link |
| 2024-03-07 | CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? | Ibrahim Alabdulmohsin et.al. | 2403.04547 | translate | read | null |
| 2024-03-04 | Reactive Programming without Functions | Bjarno Oeyen et.al. | 2403.02296 | translate | read | null |
| 2024-03-03 | Hyperspectral Image Analysis in Single-Modal and Multimodal setting using Deep Learning Techniques | Shivam Pande et.al. | 2403.01546 | translate | read | null |
| 2024-03-02 | ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | Moran Yanuka et.al. | 2403.01306 | translate | read | link |
| 2024-03-02 | Adversarial Testing for Visual Grounding via Image-Aware Property Reduction | Zhiyuan Chang et.al. | 2403.01118 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)