Scene Understanding - 2026-01 | Paper Arxiv Daily

Scene Understanding - 2026-01

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-01-31	VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning	Vivek Madhavaram et.al.	2602.00637	translate	read	null
2026-01-30	Segment Any Events with Language	Seungjun Lee et.al.	2601.23159	translate	read	link
2026-01-30	Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation	Di Zhang et.al.	2601.22988	translate	read	null
2026-01-29	FlexMap: Generalized HD Map Construction from Flexible Camera Configurations	Run Wang et.al.	2601.22376	translate	read	null
2026-01-29	Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving	Linhan Wang et.al.	2601.22032	translate	read	link
2026-01-29	LLM-Driven Scenario-Aware Planning for Autonomous Driving	He Li et.al.	2601.21876	translate	read	null
2026-01-29	From Implicit Ambiguity to Explicit Solidity: Diagnosing Interior Geometric Degradation in Neural Radiance Fields for Dense 3D Scene Understanding	Jiangsan Zhao et.al.	2601.21421	translate	read	null
2026-01-29	DSCD-Nav: Dual-Stance Cooperative Debate for Object Navigation	Weitao An et.al.	2601.21409	translate	read	null
2026-01-29	InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios	Zeyi Liu et.al.	2601.21173	translate	read	null
2026-01-28	CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization	Yue Liang et.al.	2601.20355	translate	read	null
2026-01-27	ScenePilot-Bench: A Large-Scale Dataset and Benchmark for Evaluation of Vision-Language Models in Autonomous Driving	Yujin Wang et.al.	2601.19582	translate	read	null
2026-01-26	On the Role of Depth in Surgical Vision Foundation Models: An Empirical Study of RGB-D Pre-training	John J. Han et.al.	2601.18929	translate	read	null
2026-01-26	Towards Safety-Compliant Transformer Architectures for Automotive Systems	Sven Kirchner et.al.	2601.18850	translate	read	null
2026-01-23	GPA-VGGT:Adapting VGGT to Large scale Localization by self-Supervised learning with Geometry and Physics Aware loss	Yangfan Xu et.al.	2601.16885	translate	read	null
2026-01-21	ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data	Marian Renz et.al.	2601.15025	translate	read	null
2026-01-20	Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation	Danial Sadrian Zadeh et.al.	2601.14438	translate	read	null
2026-01-19	CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting	Yu-Jen Tseng et.al.	2601.12814	translate	read	null
2026-01-19	AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation	Xuecheng Chen et.al.	2601.12742	translate	read	null
2026-01-16	SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction	Hanlin Wu et.al.	2601.11396	translate	read	null
2026-01-15	CHORAL: Traversal-Aware Planning for Safe and Efficient Heterogeneous Multi-Robot Routing	David Morilla-Cabello et.al.	2601.10340	translate	read	null
2026-01-14	OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding	Sheng-Yu Huang et.al.	2601.09575	translate	read	null
2026-01-13	Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation	Xuetao Li et.al.	2601.09031	translate	read	null
2026-01-13	Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation	Runfeng Qu et.al.	2601.08728	translate	read	null
2026-01-13	CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval	Feiran Wang et.al.	2601.08175	translate	read	null
2026-01-12	Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model	Siwen Jiao et.al.	2601.07695	translate	read	null
2026-01-12	FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments	Chen Feng et.al.	2601.07558	translate	read	null
2026-01-12	OSCAR: Open-Set CAD Retrieval from a Language Prompt and a Single Image	Tessa Pulli et.al.	2601.07333	translate	read	null
2026-01-10	3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence	Hao Tang et.al.	2601.06496	translate	read	link
2026-01-10	SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning	Chenxu Dang et.al.	2601.06474	translate	read	null
2026-01-10	Semantic Enrichment of CAD-Based Industrial Environments via Scene Graphs for Simulation and Reasoning	Nathan Pascal Walus et.al.	2601.06415	translate	read	null
2026-01-09	GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras	Weimin Liu et.al.	2601.05839	translate	read	null
2026-01-08	ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting	Yen-Jen Chiou et.al.	2601.04754	translate	read	link
2026-01-07	UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving	Zhexiao Xiong et.al.	2601.04453	translate	read	null
2026-01-07	Bayesian Monocular Depth Refinement via Neural Radiance Fields	Arun Muthukkumar et.al.	2601.03869	translate	read	null
2026-01-07	G2P: Gaussian-to-Point Attribute Alignment for Boundary-Aware 3D Semantic Segmentation	Hojun Song et.al.	2601.03510	translate	read	null
2026-01-06	EarthVL: A Progressive Earth Vision-Language Understanding and Generation Framework	Junjue Wang et.al.	2601.02783	translate	read	null
2026-01-05	InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation	Junhao Cai et.al.	2601.02456	translate	read	link
2026-01-05	Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding	Toshihiko Nishimura et.al.	2601.02029	translate	read	null
2026-01-04	LabelAny3D: Label Any Object 3D in the Wild	Jin Yao et.al.	2601.01676	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)