Scene Understanding - 2026-01

Publish Date Title Authors PDF Translate Read Code
2026-01-31 VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning Vivek Madhavaram et.al. 2602.00637 translate read null
2026-01-30 Segment Any Events with Language Seungjun Lee et.al. 2601.23159 translate read link
2026-01-30 Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation Di Zhang et.al. 2601.22988 translate read null
2026-01-29 FlexMap: Generalized HD Map Construction from Flexible Camera Configurations Run Wang et.al. 2601.22376 translate read null
2026-01-29 Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving Linhan Wang et.al. 2601.22032 translate read link
2026-01-29 LLM-Driven Scenario-Aware Planning for Autonomous Driving He Li et.al. 2601.21876 translate read null
2026-01-29 From Implicit Ambiguity to Explicit Solidity: Diagnosing Interior Geometric Degradation in Neural Radiance Fields for Dense 3D Scene Understanding Jiangsan Zhao et.al. 2601.21421 translate read null
2026-01-29 DSCD-Nav: Dual-Stance Cooperative Debate for Object Navigation Weitao An et.al. 2601.21409 translate read null
2026-01-29 InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios Zeyi Liu et.al. 2601.21173 translate read null
2026-01-28 CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization Yue Liang et.al. 2601.20355 translate read null
2026-01-27 ScenePilot-Bench: A Large-Scale Dataset and Benchmark for Evaluation of Vision-Language Models in Autonomous Driving Yujin Wang et.al. 2601.19582 translate read null
2026-01-26 On the Role of Depth in Surgical Vision Foundation Models: An Empirical Study of RGB-D Pre-training John J. Han et.al. 2601.18929 translate read null
2026-01-26 Towards Safety-Compliant Transformer Architectures for Automotive Systems Sven Kirchner et.al. 2601.18850 translate read null
2026-01-23 GPA-VGGT:Adapting VGGT to Large scale Localization by self-Supervised learning with Geometry and Physics Aware loss Yangfan Xu et.al. 2601.16885 translate read null
2026-01-21 ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data Marian Renz et.al. 2601.15025 translate read null
2026-01-20 Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation Danial Sadrian Zadeh et.al. 2601.14438 translate read null
2026-01-19 CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting Yu-Jen Tseng et.al. 2601.12814 translate read null
2026-01-19 AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation Xuecheng Chen et.al. 2601.12742 translate read null
2026-01-16 SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction Hanlin Wu et.al. 2601.11396 translate read null
2026-01-15 CHORAL: Traversal-Aware Planning for Safe and Efficient Heterogeneous Multi-Robot Routing David Morilla-Cabello et.al. 2601.10340 translate read null
2026-01-14 OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Sheng-Yu Huang et.al. 2601.09575 translate read null
2026-01-13 Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation Xuetao Li et.al. 2601.09031 translate read null
2026-01-13 Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation Runfeng Qu et.al. 2601.08728 translate read null
2026-01-13 CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval Feiran Wang et.al. 2601.08175 translate read null
2026-01-12 Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model Siwen Jiao et.al. 2601.07695 translate read null
2026-01-12 FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments Chen Feng et.al. 2601.07558 translate read null
2026-01-12 OSCAR: Open-Set CAD Retrieval from a Language Prompt and a Single Image Tessa Pulli et.al. 2601.07333 translate read null
2026-01-10 3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence Hao Tang et.al. 2601.06496 translate read link
2026-01-10 SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning Chenxu Dang et.al. 2601.06474 translate read null
2026-01-10 Semantic Enrichment of CAD-Based Industrial Environments via Scene Graphs for Simulation and Reasoning Nathan Pascal Walus et.al. 2601.06415 translate read null
2026-01-09 GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras Weimin Liu et.al. 2601.05839 translate read null
2026-01-08 ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting Yen-Jen Chiou et.al. 2601.04754 translate read link
2026-01-07 UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving Zhexiao Xiong et.al. 2601.04453 translate read null
2026-01-07 Bayesian Monocular Depth Refinement via Neural Radiance Fields Arun Muthukkumar et.al. 2601.03869 translate read null
2026-01-07 G2P: Gaussian-to-Point Attribute Alignment for Boundary-Aware 3D Semantic Segmentation Hojun Song et.al. 2601.03510 translate read null
2026-01-06 EarthVL: A Progressive Earth Vision-Language Understanding and Generation Framework Junjue Wang et.al. 2601.02783 translate read null
2026-01-05 InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation Junhao Cai et.al. 2601.02456 translate read link
2026-01-05 Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding Toshihiko Nishimura et.al. 2601.02029 translate read null
2026-01-04 LabelAny3D: Label Any Object 3D in the Wild Jin Yao et.al. 2601.01676 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)