Multimodal - 2024-11 | Paper Arxiv Daily

Multimodal - 2024-11

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-11-30	Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment	Dongfang Zhao et.al.	2412.00373	translate	read	null
2024-11-29	SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition	Fangze Fu et.al.	2411.19822	translate	read	null
2024-11-26	Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment	Zheng Chen et.al.	2411.17237	translate	read	link
2024-11-26	Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation	Xu Zheng et.al.	2411.17141	translate	read	link
2024-11-26	Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models	Colin Conwell et.al.	2411.17066	translate	read	link
2024-11-26	Multimodal Alignment and Fusion: A Survey	Songtao Li et.al.	2411.17040	translate	read	null
2024-11-25	Language Driven Occupancy Prediction	Zhu Yu et.al.	2411.16072	translate	read	link
2024-11-23	From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning	Lixiang Yan et.al.	2411.15590	translate	read	null
2024-11-23	Botfip-LLM: An Enhanced Multimodal Scientific Computing Framework Leveraging Knowledge Distillation from Large Language Models	Tianhao Chen et.al.	2411.15525	translate	read	null
2024-11-22	PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision	Arnav M. Das et.al.	2411.15127	translate	read	null
2024-11-21	Generative AI for Music and Audio	Hao-Wen Dong et.al.	2411.14627	translate	read	null
2024-11-21	Multimodal 3D Reasoning Segmentation with Complex Scenes	Xueying Jiang et.al.	2411.13927	translate	read	null
2024-11-12	Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media	Naga VS Raviteja Chappa et.al.	2411.13572	translate	read	null
2024-11-20	I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences	Zihan Wang et.al.	2411.12960	translate	read	null
2024-11-18	MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT	Xiaomin Ouyang et.al.	2411.12126	translate	read	null
2024-11-19	SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach	Ruoxi Sun et.al.	2411.11195	translate	read	null
2024-11-15	Everything is a Video: Unifying Modalities through Next-Frame Prediction	G. Thomas Hudson et.al.	2411.10503	translate	read	null
2024-11-15	Weakly-Supervised Multimodal Learning on MIMIC-CXR	Andrea Agostini et.al.	2411.10356	translate	read	null
2024-11-15	CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation	Xiaofei Zhu et.al.	2411.10060	translate	read	null
2024-11-21	Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era	Thanh Tam Nguyen et.al.	2411.09955	translate	read	link
2024-11-14	SmartInv: Multimodal Learning for Smart Contract Invariant Inference	Sally Junsong Wang et.al.	2411.09217	translate	read	null
2024-11-12	NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN	Sonia Raychaudhuri et.al.	2411.07848	translate	read	null
2024-11-11	Multimodal Fusion Balancing Through Game-Theoretic Regularization	Konstantinos Kontras et.al.	2411.07335	translate	read	null
2024-11-11	StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	Yichen He et.al.	2411.07076	translate	read	link
2024-11-08	Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors	Yuanyuan Liu et.al.	2411.05879	translate	read	null
2024-11-06	AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool	Zhongliang Tang et.al.	2411.03709	translate	read	null
2024-11-05	STEER: Flexible Robotic Manipulation via Dense Language Grounding	Laura Smith et.al.	2411.03409	translate	read	null
2024-11-05	Grounding Natural Language to SQL Translation with Data-Based Self-Explanations	Yuankai Fan et.al.	2411.02948	translate	read	link
2024-11-04	Grounding Emotional Descriptions to Electrovibration Haptic Signals	Guimin Hu et.al.	2411.02118	translate	read	null
2024-11-03	Classifier-guided Gradient Modulation for Enhanced Multimodal Learning	Zirun Guo et.al.	2411.01409	translate	read	link
2024-11-01	Text2Freq: Learning Series Patterns from Text via Frequency Domain	Ming-Chih Lo et.al.	2411.00929	translate	read	null
2024-11-01	Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective	Carlotta Langer et.al.	2411.00522	translate	read	null

(<a href=../Multimodal.md>back to Multimodal</a>)