Scene Understanding - 2025-02
Scene Understanding - 2025-02
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-02-28 | Vibrotactile information coding strategies for a body-worn vest to aid robot-human collaboration | Adrian Vecina Tercero et.al. | 2502.21056 | translate | read | null |
| 2025-02-27 | Towards Statistical Factuality Guarantee for Large Vision-Language Models | Zhuohang Li et.al. | 2502.20560 | translate | read | null |
| 2025-02-26 | Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | Xiankang He et.al. | 2502.19204 | translate | read | link |
| 2025-02-25 | VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion | Pei Liu et.al. | 2502.18042 | translate | read | null |
| 2025-02-24 | AAD-LLM: Neural Attention-Driven Auditory Scene Understanding | Xilin Jiang et.al. | 2502.16794 | translate | read | link |
| 2025-02-28 | Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model | Yaxuan Huang et.al. | 2502.16779 | translate | read | link |
| 2025-02-23 | Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration | Kim Jun-Seong et.al. | 2502.16652 | translate | read | null |
| 2025-02-21 | Weakly Supervised Video Scene Graph Generation via Natural Language Supervision | Kibum Kim et.al. | 2502.15370 | translate | read | link |
| 2025-02-21 | DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation | Luzhou Ge et.al. | 2502.15309 | translate | read | link |
| 2025-02-21 | Hierarchical Context Transformer for Multi-level Semantic Scene Understanding | Luoying Hao et.al. | 2502.15184 | translate | read | link |
| 2025-02-20 | CrossOver: 3D Scene Cross-Modal Alignment | Sayan Deb Sarkar et.al. | 2502.15011 | translate | read | link |
| 2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931 | translate | read | null |
| 2025-02-19 | Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning | Rui Zhao et.al. | 2502.14917 | translate | read | null |
| 2025-02-16 | Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review | Ufaq Khan et.al. | 2502.14886 | translate | read | null |
| 2025-02-21 | AVD2: Accident Video Diffusion for Accident Video Description | Cheng Li et.al. | 2502.14801 | translate | read | null |
| 2025-02-18 | Spiking Vision Transformer with Saccadic Attention | Shuai Wang et.al. | 2502.12677 | translate | read | null |
| 2025-02-16 | NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM | Zihan Wang et.al. | 2502.11142 | translate | read | link |
| 2025-02-15 | Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy | Mingyang Zhao et.al. | 2502.10704 | translate | read | link |
| 2025-02-14 | Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation | Gamal Elghazaly et.al. | 2502.10127 | translate | read | null |
| 2025-02-13 | FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation | Bin Yang et.al. | 2502.09274 | translate | read | null |
| 2025-02-13 | Billet Number Recognition Based on Test-Time Adaptation | Yuan Wei et.al. | 2502.09026 | translate | read | null |
| 2025-02-13 | EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition | Xiao Wang et.al. | 2502.09020 | translate | read | link |
| 2025-02-13 | 3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning | Guoqin Tang et.al. | 2502.08903 | translate | read | null |
| 2025-02-10 | Fully Exploiting Vision Foundation Model’s Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing | Sicen Guo et.al. | 2502.06219 | translate | read | null |
| 2025-02-08 | Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model | Mohammad Kianpisheh et.al. | 2502.05457 | translate | read | null |
| 2025-02-06 | sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views | Eyvaz Najafli et.al. | 2502.04318 | translate | read | null |
| 2025-02-06 | Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation | Lin Li et.al. | 2502.03856 | translate | read | null |
| 2025-02-05 | EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality | Junlong Chen et.al. | 2502.03564 | translate | read | null |
| 2025-02-04 | Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation | Junha Lee et.al. | 2502.02548 | translate | read | null |
| 2025-02-04 | Event-aided Semantic Scene Completion | Shangwei Guo et.al. | 2502.02334 | translate | read | link |
| 2025-02-03 | AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis | Basit Alawode et.al. | 2502.01785 | translate | read | null |
| 2025-02-04 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)