Multimodal - 2024-10

Publish Date Title Authors PDF Translate Read Code
2024-10-29 EEG-based Multimodal Representation Learning for Emotion Recognition Kang Yin et.al. 2411.00822 translate read null
2024-10-30 PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation Ryozo Masukawa et.al. 2410.22623 translate read null
2024-10-28 IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks Manjunath D et.al. 2410.20953 translate read link
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702 translate read null
2024-10-24 UGotMe: An Embodied System for Affective Human-Robot Interaction Peizhen Li et.al. 2410.18373 translate read link
2024-10-22 EVC-MF: End-to-end Video Captioning Network with Multi-scale Features Tian-Zi Niu et.al. 2410.16624 translate read null
2024-10-22 MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report Samrajya Thapa et.al. 2410.16239 translate read link
2024-10-21 Multimodal Learning for Embryo Viability Prediction in Clinical IVF Junsik Kim et.al. 2410.15581 translate read null
2024-10-20 Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison Shiyu Hu et.al. 2410.15270 translate read null
2024-10-15 CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning Qingqing Cao et.al. 2410.11963 translate read null
2024-10-15 Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers Davide Celestini et.al. 2410.11723 translate read null
2024-10-15 On-the-fly Modulation for Balanced Multimodal Learning Yake Wei et.al. 2410.11582 translate read link
2024-10-14 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Peng Xia et.al. 2410.10139 translate read link
2024-10-10 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 translate read link
2024-10-11 Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Changli Tang et.al. 2410.06682 translate read null
2024-10-08 Multimodal Representation Learning using Adaptive Graph Construction Weichen Huang et.al. 2410.06395 translate read null
2024-10-07 Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models Dehong Kong et.al. 2410.04884 translate read null
2024-10-07 MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection Niki Nezakati et.al. 2410.03010 translate read null
2024-10-02 Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations Minoh Jeong et.al. 2410.02086 translate read null
2024-10-02 Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark Zheng Lian et.al. 2410.01495 translate read null
2024-10-04 VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Jiapeng Wang et.al. 2410.00741 translate read null
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 translate read link

(<a href=../Multimodal.md>back to Multimodal</a>)