Multimodal - 2024-10 | Paper Arxiv Daily

Multimodal - 2024-10

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-10-29	EEG-based Multimodal Representation Learning for Emotion Recognition	Kang Yin et.al.	2411.00822	translate	read	null
2024-10-30	PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation	Ryozo Masukawa et.al.	2410.22623	translate	read	null
2024-10-28	IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks	Manjunath D et.al.	2410.20953	translate	read	link
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702	translate	read	null
2024-10-24	UGotMe: An Embodied System for Affective Human-Robot Interaction	Peizhen Li et.al.	2410.18373	translate	read	link
2024-10-22	EVC-MF: End-to-end Video Captioning Network with Multi-scale Features	Tian-Zi Niu et.al.	2410.16624	translate	read	null
2024-10-22	MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report	Samrajya Thapa et.al.	2410.16239	translate	read	link
2024-10-21	Multimodal Learning for Embryo Viability Prediction in Clinical IVF	Junsik Kim et.al.	2410.15581	translate	read	null
2024-10-20	Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison	Shiyu Hu et.al.	2410.15270	translate	read	null
2024-10-15	CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning	Qingqing Cao et.al.	2410.11963	translate	read	null
2024-10-15	Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers	Davide Celestini et.al.	2410.11723	translate	read	null
2024-10-15	On-the-fly Modulation for Balanced Multimodal Learning	Yake Wei et.al.	2410.11582	translate	read	link
2024-10-14	MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models	Peng Xia et.al.	2410.10139	translate	read	link
2024-10-10	Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts	Sukwon Yun et.al.	2410.08245	translate	read	link
2024-10-11	Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization	Changli Tang et.al.	2410.06682	translate	read	null
2024-10-08	Multimodal Representation Learning using Adaptive Graph Construction	Weichen Huang et.al.	2410.06395	translate	read	null
2024-10-07	Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models	Dehong Kong et.al.	2410.04884	translate	read	null
2024-10-07	MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection	Niki Nezakati et.al.	2410.03010	translate	read	null
2024-10-02	Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations	Minoh Jeong et.al.	2410.02086	translate	read	null
2024-10-02	Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark	Zheng Lian et.al.	2410.01495	translate	read	null
2024-10-04	VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models	Jiapeng Wang et.al.	2410.00741	translate	read	null
2024-10-02	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Jihai Zhang et.al.	2409.19291	translate	read	link

(<a href=../Multimodal.md>back to Multimodal</a>)