Multimodal - 2024-05
Multimodal - 2024-05
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-05-31 | Ovis: Structural Embedding Alignment for Multimodal Large Language Model | Shiyin Lu et.al. | 2405.20797 | translate | read | null |
| 2024-05-31 | Visual Attention Analysis in Online Learning | Miriam Navarro et.al. | 2405.20091 | translate | read | null |
| 2024-05-29 | Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining | Blake R. Duschatko et.al. | 2405.19386 | translate | read | null |
| 2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | translate | read | link |
| 2024-05-29 | Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches | A. Hammad et.al. | 2405.18834 | translate | read | null |
| 2024-05-28 | RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives | Jaehong Yoon et.al. | 2405.18406 | translate | read | link |
| 2024-05-28 | MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance | Yake Wei et.al. | 2405.17730 | translate | read | link |
| 2024-05-27 | Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning | Zihua Zhao et.al. | 2405.16996 | translate | read | null |
| 2024-05-27 | Multilingual Diversity Improves Vision-Language Representations | Thao Nguyen et.al. | 2405.16915 | translate | read | null |
| 2024-05-27 | Hawk: Learning to Understand Open-World Video Anomalies | Jiaqi Tang et.al. | 2405.16886 | translate | read | link |
| 2024-05-24 | Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search | Marie Al Ghossein et.al. | 2405.15190 | translate | read | link |
| 2024-05-23 | TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing | Teng Xu et.al. | 2405.14455 | translate | read | null |
| 2024-05-22 | Grounding Toxicity in Real-World Events across Languages | Wondimagegnhue Tsegaye Tufa et.al. | 2405.13754 | translate | read | link |
| 2024-05-21 | A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings | Vanya Cohen et.al. | 2405.13245 | translate | read | null |
| 2024-05-21 | Inconsistency-Aware Cross-Attention for Audio-Visual Fusion in Dimensional Emotion Recognition | R Gnana Praveen et.al. | 2405.12853 | translate | read | null |
| 2024-05-21 | Scientific discourse on YouTube: Motivations for citing research in comments | Sören Striewski et.al. | 2405.12798 | translate | read | null |
| 2024-05-21 | Amplifying Academic Research through YouTube: Engagement Metrics as Predictors of Citation Impact | Olga Zagovora et.al. | 2405.12734 | translate | read | null |
| 2024-05-21 | A Multimodal Learning-based Approach for Autonomous Landing of UAV | Francisco Neves et.al. | 2405.12681 | translate | read | null |
| 2024-05-21 | Mutual Information Analysis in Multimodal Learning Systems | Hadi Hadizadeh et.al. | 2405.12456 | translate | read | null |
| 2024-05-16 | Grounded 3D-LLM with Referent Tokens | Yilun Chen et.al. | 2405.10370 | translate | read | link |
| 2024-05-13 | Improving Multimodal Learning with Multi-Loss Gradient Modulation | Konstantinos Kontras et.al. | 2405.07930 | translate | read | link |
| 2024-05-13 | Generating Human Motion in 3D Scenes from Text Descriptions | Zhi Cen et.al. | 2405.07784 | translate | read | null |
| 2024-05-13 | An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention | Junichiro Niimi et.al. | 2405.07435 | translate | read | null |
| 2024-05-10 | A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments | Joyce Fonteles et.al. | 2405.06203 | translate | read | null |
| 2024-05-09 | Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training | Sheng Yan et.al. | 2405.05523 | translate | read | null |
| 2024-05-08 | Empathy Through Multimodality in Conversational Interfaces | Mahyar Abbasian et.al. | 2405.04777 | translate | read | null |
| 2024-05-08 | All in One Framework for Multimodal Re-identification in the Wild | He Li et.al. | 2405.04741 | translate | read | null |
| 2024-05-07 | Interpretable Tensor Fusion | Saurabh Varshneya et.al. | 2405.04671 | translate | read | null |
| 2024-05-03 | Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum | Tao Meng et.al. | 2404.17862 | translate | read | null |
(<a href=../Multimodal.md>back to Multimodal</a>)