Scene Understanding - 2026-01
Scene Understanding - 2026-01
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2026-01-31 | VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning | Vivek Madhavaram et.al. | 2602.00637 | translate | read | null |
| 2026-01-30 | Segment Any Events with Language | Seungjun Lee et.al. | 2601.23159 | translate | read | link |
| 2026-01-30 | Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation | Di Zhang et.al. | 2601.22988 | translate | read | null |
| 2026-01-29 | FlexMap: Generalized HD Map Construction from Flexible Camera Configurations | Run Wang et.al. | 2601.22376 | translate | read | null |
| 2026-01-29 | Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving | Linhan Wang et.al. | 2601.22032 | translate | read | link |
| 2026-01-29 | LLM-Driven Scenario-Aware Planning for Autonomous Driving | He Li et.al. | 2601.21876 | translate | read | null |
| 2026-01-29 | From Implicit Ambiguity to Explicit Solidity: Diagnosing Interior Geometric Degradation in Neural Radiance Fields for Dense 3D Scene Understanding | Jiangsan Zhao et.al. | 2601.21421 | translate | read | null |
| 2026-01-29 | DSCD-Nav: Dual-Stance Cooperative Debate for Object Navigation | Weitao An et.al. | 2601.21409 | translate | read | null |
| 2026-01-29 | InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios | Zeyi Liu et.al. | 2601.21173 | translate | read | null |
| 2026-01-28 | CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization | Yue Liang et.al. | 2601.20355 | translate | read | null |
| 2026-01-27 | ScenePilot-Bench: A Large-Scale Dataset and Benchmark for Evaluation of Vision-Language Models in Autonomous Driving | Yujin Wang et.al. | 2601.19582 | translate | read | null |
| 2026-01-26 | On the Role of Depth in Surgical Vision Foundation Models: An Empirical Study of RGB-D Pre-training | John J. Han et.al. | 2601.18929 | translate | read | null |
| 2026-01-26 | Towards Safety-Compliant Transformer Architectures for Automotive Systems | Sven Kirchner et.al. | 2601.18850 | translate | read | null |
| 2026-01-23 | GPA-VGGT:Adapting VGGT to Large scale Localization by self-Supervised learning with Geometry and Physics Aware loss | Yangfan Xu et.al. | 2601.16885 | translate | read | null |
| 2026-01-21 | ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data | Marian Renz et.al. | 2601.15025 | translate | read | null |
| 2026-01-20 | Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation | Danial Sadrian Zadeh et.al. | 2601.14438 | translate | read | null |
| 2026-01-19 | CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting | Yu-Jen Tseng et.al. | 2601.12814 | translate | read | null |
| 2026-01-19 | AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation | Xuecheng Chen et.al. | 2601.12742 | translate | read | null |
| 2026-01-16 | SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction | Hanlin Wu et.al. | 2601.11396 | translate | read | null |
| 2026-01-15 | CHORAL: Traversal-Aware Planning for Safe and Efficient Heterogeneous Multi-Robot Routing | David Morilla-Cabello et.al. | 2601.10340 | translate | read | null |
| 2026-01-14 | OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding | Sheng-Yu Huang et.al. | 2601.09575 | translate | read | null |
| 2026-01-13 | Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation | Xuetao Li et.al. | 2601.09031 | translate | read | null |
| 2026-01-13 | Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation | Runfeng Qu et.al. | 2601.08728 | translate | read | null |
| 2026-01-13 | CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval | Feiran Wang et.al. | 2601.08175 | translate | read | null |
| 2026-01-12 | Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model | Siwen Jiao et.al. | 2601.07695 | translate | read | null |
| 2026-01-12 | FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments | Chen Feng et.al. | 2601.07558 | translate | read | null |
| 2026-01-12 | OSCAR: Open-Set CAD Retrieval from a Language Prompt and a Single Image | Tessa Pulli et.al. | 2601.07333 | translate | read | null |
| 2026-01-10 | 3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence | Hao Tang et.al. | 2601.06496 | translate | read | link |
| 2026-01-10 | SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning | Chenxu Dang et.al. | 2601.06474 | translate | read | null |
| 2026-01-10 | Semantic Enrichment of CAD-Based Industrial Environments via Scene Graphs for Simulation and Reasoning | Nathan Pascal Walus et.al. | 2601.06415 | translate | read | null |
| 2026-01-09 | GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras | Weimin Liu et.al. | 2601.05839 | translate | read | null |
| 2026-01-08 | ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting | Yen-Jen Chiou et.al. | 2601.04754 | translate | read | link |
| 2026-01-07 | UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving | Zhexiao Xiong et.al. | 2601.04453 | translate | read | null |
| 2026-01-07 | Bayesian Monocular Depth Refinement via Neural Radiance Fields | Arun Muthukkumar et.al. | 2601.03869 | translate | read | null |
| 2026-01-07 | G2P: Gaussian-to-Point Attribute Alignment for Boundary-Aware 3D Semantic Segmentation | Hojun Song et.al. | 2601.03510 | translate | read | null |
| 2026-01-06 | EarthVL: A Progressive Earth Vision-Language Understanding and Generation Framework | Junjue Wang et.al. | 2601.02783 | translate | read | null |
| 2026-01-05 | InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation | Junhao Cai et.al. | 2601.02456 | translate | read | link |
| 2026-01-05 | Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding | Toshihiko Nishimura et.al. | 2601.02029 | translate | read | null |
| 2026-01-04 | LabelAny3D: Label Any Object 3D in the Wild | Jin Yao et.al. | 2601.01676 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)