Multimodal - 2024-10
Multimodal - 2024-10
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-10-29 | EEG-based Multimodal Representation Learning for Emotion Recognition | Kang Yin et.al. | 2411.00822 | translate | read | null |
| 2024-10-30 | PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation | Ryozo Masukawa et.al. | 2410.22623 | translate | read | null |
| 2024-10-28 | IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks | Manjunath D et.al. | 2410.20953 | translate | read | link |
| 2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | translate | read | null |
| 2024-10-24 | UGotMe: An Embodied System for Affective Human-Robot Interaction | Peizhen Li et.al. | 2410.18373 | translate | read | link |
| 2024-10-22 | EVC-MF: End-to-end Video Captioning Network with Multi-scale Features | Tian-Zi Niu et.al. | 2410.16624 | translate | read | null |
| 2024-10-22 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239 | translate | read | link |
| 2024-10-21 | Multimodal Learning for Embryo Viability Prediction in Clinical IVF | Junsik Kim et.al. | 2410.15581 | translate | read | null |
| 2024-10-20 | Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison | Shiyu Hu et.al. | 2410.15270 | translate | read | null |
| 2024-10-15 | CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning | Qingqing Cao et.al. | 2410.11963 | translate | read | null |
| 2024-10-15 | Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers | Davide Celestini et.al. | 2410.11723 | translate | read | null |
| 2024-10-15 | On-the-fly Modulation for Balanced Multimodal Learning | Yake Wei et.al. | 2410.11582 | translate | read | link |
| 2024-10-14 | MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models | Peng Xia et.al. | 2410.10139 | translate | read | link |
| 2024-10-10 | Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Sukwon Yun et.al. | 2410.08245 | translate | read | link |
| 2024-10-11 | Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Changli Tang et.al. | 2410.06682 | translate | read | null |
| 2024-10-08 | Multimodal Representation Learning using Adaptive Graph Construction | Weichen Huang et.al. | 2410.06395 | translate | read | null |
| 2024-10-07 | Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models | Dehong Kong et.al. | 2410.04884 | translate | read | null |
| 2024-10-07 | MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection | Niki Nezakati et.al. | 2410.03010 | translate | read | null |
| 2024-10-02 | Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations | Minoh Jeong et.al. | 2410.02086 | translate | read | null |
| 2024-10-02 | Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark | Zheng Lian et.al. | 2410.01495 | translate | read | null |
| 2024-10-04 | VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models | Jiapeng Wang et.al. | 2410.00741 | translate | read | null |
| 2024-10-02 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Jihai Zhang et.al. | 2409.19291 | translate | read | link |
(<a href=../Multimodal.md>back to Multimodal</a>)