Multimodal - 2024-05 | Paper Arxiv Daily

Multimodal - 2024-05

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-05-31	Ovis: Structural Embedding Alignment for Multimodal Large Language Model	Shiyin Lu et.al.	2405.20797	translate	read	null
2024-05-31	Visual Attention Analysis in Online Learning	Miriam Navarro et.al.	2405.20091	translate	read	null
2024-05-29	Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining	Blake R. Duschatko et.al.	2405.19386	translate	read	null
2024-05-29	LLMs Meet Multimodal Generation and Editing: A Survey	Yingqing He et.al.	2405.19334	translate	read	link
2024-05-29	Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches	A. Hammad et.al.	2405.18834	translate	read	null
2024-05-28	RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives	Jaehong Yoon et.al.	2405.18406	translate	read	link
2024-05-28	MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance	Yake Wei et.al.	2405.17730	translate	read	link
2024-05-27	Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning	Zihua Zhao et.al.	2405.16996	translate	read	null
2024-05-27	Multilingual Diversity Improves Vision-Language Representations	Thao Nguyen et.al.	2405.16915	translate	read	null
2024-05-27	Hawk: Learning to Understand Open-World Video Anomalies	Jiaqi Tang et.al.	2405.16886	translate	read	link
2024-05-24	Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search	Marie Al Ghossein et.al.	2405.15190	translate	read	link
2024-05-23	TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing	Teng Xu et.al.	2405.14455	translate	read	null
2024-05-22	Grounding Toxicity in Real-World Events across Languages	Wondimagegnhue Tsegaye Tufa et.al.	2405.13754	translate	read	link
2024-05-21	A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings	Vanya Cohen et.al.	2405.13245	translate	read	null
2024-05-21	Inconsistency-Aware Cross-Attention for Audio-Visual Fusion in Dimensional Emotion Recognition	R Gnana Praveen et.al.	2405.12853	translate	read	null
2024-05-21	Scientific discourse on YouTube: Motivations for citing research in comments	Sören Striewski et.al.	2405.12798	translate	read	null
2024-05-21	Amplifying Academic Research through YouTube: Engagement Metrics as Predictors of Citation Impact	Olga Zagovora et.al.	2405.12734	translate	read	null
2024-05-21	A Multimodal Learning-based Approach for Autonomous Landing of UAV	Francisco Neves et.al.	2405.12681	translate	read	null
2024-05-21	Mutual Information Analysis in Multimodal Learning Systems	Hadi Hadizadeh et.al.	2405.12456	translate	read	null
2024-05-16	Grounded 3D-LLM with Referent Tokens	Yilun Chen et.al.	2405.10370	translate	read	link
2024-05-13	Improving Multimodal Learning with Multi-Loss Gradient Modulation	Konstantinos Kontras et.al.	2405.07930	translate	read	link
2024-05-13	Generating Human Motion in 3D Scenes from Text Descriptions	Zhi Cen et.al.	2405.07784	translate	read	null
2024-05-13	An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention	Junichiro Niimi et.al.	2405.07435	translate	read	null
2024-05-10	A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments	Joyce Fonteles et.al.	2405.06203	translate	read	null
2024-05-09	Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training	Sheng Yan et.al.	2405.05523	translate	read	null
2024-05-08	Empathy Through Multimodality in Conversational Interfaces	Mahyar Abbasian et.al.	2405.04777	translate	read	null
2024-05-08	All in One Framework for Multimodal Re-identification in the Wild	He Li et.al.	2405.04741	translate	read	null
2024-05-07	Interpretable Tensor Fusion	Saurabh Varshneya et.al.	2405.04671	translate	read	null
2024-05-03	Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum	Tao Meng et.al.	2404.17862	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)