Multimodal - 2024-11
Multimodal - 2024-11
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-11-30 | Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment | Dongfang Zhao et.al. | 2412.00373 | translate | read | null |
| 2024-11-29 | SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition | Fangze Fu et.al. | 2411.19822 | translate | read | null |
| 2024-11-26 | Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment | Zheng Chen et.al. | 2411.17237 | translate | read | link |
| 2024-11-26 | Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation | Xu Zheng et.al. | 2411.17141 | translate | read | link |
| 2024-11-26 | Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models | Colin Conwell et.al. | 2411.17066 | translate | read | link |
| 2024-11-26 | Multimodal Alignment and Fusion: A Survey | Songtao Li et.al. | 2411.17040 | translate | read | null |
| 2024-11-25 | Language Driven Occupancy Prediction | Zhu Yu et.al. | 2411.16072 | translate | read | link |
| 2024-11-23 | From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning | Lixiang Yan et.al. | 2411.15590 | translate | read | null |
| 2024-11-23 | Botfip-LLM: An Enhanced Multimodal Scientific Computing Framework Leveraging Knowledge Distillation from Large Language Models | Tianhao Chen et.al. | 2411.15525 | translate | read | null |
| 2024-11-22 | PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision | Arnav M. Das et.al. | 2411.15127 | translate | read | null |
| 2024-11-21 | Generative AI for Music and Audio | Hao-Wen Dong et.al. | 2411.14627 | translate | read | null |
| 2024-11-21 | Multimodal 3D Reasoning Segmentation with Complex Scenes | Xueying Jiang et.al. | 2411.13927 | translate | read | null |
| 2024-11-12 | Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media | Naga VS Raviteja Chappa et.al. | 2411.13572 | translate | read | null |
| 2024-11-20 | I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences | Zihan Wang et.al. | 2411.12960 | translate | read | null |
| 2024-11-18 | MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT | Xiaomin Ouyang et.al. | 2411.12126 | translate | read | null |
| 2024-11-19 | SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach | Ruoxi Sun et.al. | 2411.11195 | translate | read | null |
| 2024-11-15 | Everything is a Video: Unifying Modalities through Next-Frame Prediction | G. Thomas Hudson et.al. | 2411.10503 | translate | read | null |
| 2024-11-15 | Weakly-Supervised Multimodal Learning on MIMIC-CXR | Andrea Agostini et.al. | 2411.10356 | translate | read | null |
| 2024-11-15 | CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Xiaofei Zhu et.al. | 2411.10060 | translate | read | null |
| 2024-11-21 | Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Thanh Tam Nguyen et.al. | 2411.09955 | translate | read | link |
| 2024-11-14 | SmartInv: Multimodal Learning for Smart Contract Invariant Inference | Sally Junsong Wang et.al. | 2411.09217 | translate | read | null |
| 2024-11-12 | NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN | Sonia Raychaudhuri et.al. | 2411.07848 | translate | read | null |
| 2024-11-11 | Multimodal Fusion Balancing Through Game-Theoretic Regularization | Konstantinos Kontras et.al. | 2411.07335 | translate | read | null |
| 2024-11-11 | StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Yichen He et.al. | 2411.07076 | translate | read | link |
| 2024-11-08 | Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors | Yuanyuan Liu et.al. | 2411.05879 | translate | read | null |
| 2024-11-06 | AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool | Zhongliang Tang et.al. | 2411.03709 | translate | read | null |
| 2024-11-05 | STEER: Flexible Robotic Manipulation via Dense Language Grounding | Laura Smith et.al. | 2411.03409 | translate | read | null |
| 2024-11-05 | Grounding Natural Language to SQL Translation with Data-Based Self-Explanations | Yuankai Fan et.al. | 2411.02948 | translate | read | link |
| 2024-11-04 | Grounding Emotional Descriptions to Electrovibration Haptic Signals | Guimin Hu et.al. | 2411.02118 | translate | read | null |
| 2024-11-03 | Classifier-guided Gradient Modulation for Enhanced Multimodal Learning | Zirun Guo et.al. | 2411.01409 | translate | read | link |
| 2024-11-01 | Text2Freq: Learning Series Patterns from Text via Frequency Domain | Ming-Chih Lo et.al. | 2411.00929 | translate | read | null |
| 2024-11-01 | Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective | Carlotta Langer et.al. | 2411.00522 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)