Scene Understanding - 2025-02

Publish Date Title Authors PDF Translate Read Code
2025-02-28 Vibrotactile information coding strategies for a body-worn vest to aid robot-human collaboration Adrian Vecina Tercero et.al. 2502.21056 translate read null
2025-02-27 Towards Statistical Factuality Guarantee for Large Vision-Language Models Zhuohang Li et.al. 2502.20560 translate read null
2025-02-26 Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Xiankang He et.al. 2502.19204 translate read link
2025-02-25 VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion Pei Liu et.al. 2502.18042 translate read null
2025-02-24 AAD-LLM: Neural Attention-Driven Auditory Scene Understanding Xilin Jiang et.al. 2502.16794 translate read link
2025-02-28 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model Yaxuan Huang et.al. 2502.16779 translate read link
2025-02-23 Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration Kim Jun-Seong et.al. 2502.16652 translate read null
2025-02-21 Weakly Supervised Video Scene Graph Generation via Natural Language Supervision Kibum Kim et.al. 2502.15370 translate read link
2025-02-21 DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation Luzhou Ge et.al. 2502.15309 translate read link
2025-02-21 Hierarchical Context Transformer for Multi-level Semantic Scene Understanding Luoying Hao et.al. 2502.15184 translate read link
2025-02-20 CrossOver: 3D Scene Cross-Modal Alignment Sayan Deb Sarkar et.al. 2502.15011 translate read link
2025-02-20 Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting Boying Li et.al. 2502.14931 translate read null
2025-02-19 Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning Rui Zhao et.al. 2502.14917 translate read null
2025-02-16 Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review Ufaq Khan et.al. 2502.14886 translate read null
2025-02-21 AVD2: Accident Video Diffusion for Accident Video Description Cheng Li et.al. 2502.14801 translate read null
2025-02-18 Spiking Vision Transformer with Saccadic Attention Shuai Wang et.al. 2502.12677 translate read null
2025-02-16 NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM Zihan Wang et.al. 2502.11142 translate read link
2025-02-15 Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy Mingyang Zhao et.al. 2502.10704 translate read link
2025-02-14 Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation Gamal Elghazaly et.al. 2502.10127 translate read null
2025-02-13 FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation Bin Yang et.al. 2502.09274 translate read null
2025-02-13 Billet Number Recognition Based on Test-Time Adaptation Yuan Wei et.al. 2502.09026 translate read null
2025-02-13 EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition Xiao Wang et.al. 2502.09020 translate read link
2025-02-13 3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning Guoqin Tang et.al. 2502.08903 translate read null
2025-02-10 Fully Exploiting Vision Foundation Model’s Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing Sicen Guo et.al. 2502.06219 translate read null
2025-02-08 Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model Mohammad Kianpisheh et.al. 2502.05457 translate read null
2025-02-06 sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views Eyvaz Najafli et.al. 2502.04318 translate read null
2025-02-06 Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation Lin Li et.al. 2502.03856 translate read null
2025-02-05 EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality Junlong Chen et.al. 2502.03564 translate read null
2025-02-04 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation Junha Lee et.al. 2502.02548 translate read null
2025-02-04 Event-aided Semantic Scene Completion Shangwei Guo et.al. 2502.02334 translate read link
2025-02-03 AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis Basit Alawode et.al. 2502.01785 translate read null
2025-02-04 Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Akash Kumar et.al. 2501.17053 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)