Scene Understanding - 2025-12

Publish Date Title Authors PDF Translate Read Code
2025-12-31 Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark Pan Wang et.al. 2601.00092 translate read null
2025-12-31 UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning Ankit Dhiman et.al. 2512.24763 translate read null
2025-12-31 3D Semantic Segmentation for Post-Disaster Assessment Nhut Le et.al. 2512.24593 translate read null
2025-12-30 Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models Kim Alexander Christensen et.al. 2512.24470 translate read null
2025-12-30 Spatial-aware Vision Language Model for Autonomous Driving Weijie Wei et.al. 2512.24331 translate read null
2025-12-25 Break Out the Silverware – Semantic Understanding of Stored Household Items Michaela Levi-Richter et.al. 2512.23739 translate read null
2025-12-29 Multi-label Classification with Panoptic Context Aggregation Networks Mingyuan Jiu et.al. 2512.23486 translate read null
2025-12-29 SpatialMosaic: A Multiview VLM Dataset for Partial Visibility Kanghee Lee et.al. 2512.23365 translate read null
2025-12-29 AVOID: The Adverse Visual Conditions Dataset with Obstacles for Driving Scene Understanding Jongoh Jeong et.al. 2512.23215 translate read null
2025-12-29 GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation Tianchen Deng et.al. 2512.23180 translate read null
2025-12-28 ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving Qihang Peng et.al. 2512.22939 translate read null
2025-12-28 Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting Yiqian Li et.al. 2512.22771 translate read null
2025-12-27 Instance Communication System for Intelligent Connected Vehicles: Bridging the Gap from Semantic to Instance-Level Transmission Daiqi Zhang et.al. 2512.22693 translate read null
2025-12-26 VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement Zhengfei Kuang et.al. 2512.22351 translate read null
2025-12-24 Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential Shihao Zou et.al. 2512.21284 translate read null
2025-12-23 OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective Markus Gross et.al. 2512.20770 translate read null
2025-12-22 CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models Pengyu Chen et.al. 2512.19083 translate read null
2025-12-22 VOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene Completion Zaidao Han et.al. 2512.18954 translate read null
2025-12-21 Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes Jingjia Teng et.al. 2512.18836 translate read null
2025-12-20 LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning Yudong Liu et.al. 2512.18211 translate read null
2025-12-19 InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Hoiyeong Jin et.al. 2512.17504 translate read null
2025-12-18 MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning Yuanchen Ju et.al. 2512.16909 translate read null
2025-12-18 SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning Tin Stribor Sohn et.al. 2512.16461 translate read null
2025-12-18 Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception Bangya Liu et.al. 2512.16265 translate read null
2025-12-16 Unified Semantic Transformer for 3D Scene Understanding Sebastian Koch et.al. 2512.14364 translate read null
2025-12-16 Consistent Instance Field for Dynamic Scene Understanding Junyi Wu et.al. 2512.14126 translate read null
2025-12-16 Deep Learning Perspective of Scene Understanding in Autonomous Robots Afia Maham et.al. 2512.14020 translate read null
2025-12-15 I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners Lu Ling et.al. 2512.13683 translate read null
2025-12-15 MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion Minghui Hou et.al. 2512.13177 translate read null
2025-12-15 DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass Vivek Alumootil et.al. 2512.13122 translate read null
2025-12-15 SLIM-VDB: A Real-Time 3D Probabilistic Semantic Mapping Framework Anja Sheppard et.al. 2512.12945 translate read null
2025-12-13 INDOOR-LiDAR: Bridging Simulation and Reality for Robot-Centric 360 degree Indoor LiDAR Perception – A Robot-Centric Hybrid Dataset Haichuan Li et.al. 2512.12377 translate read null
2025-12-13 MRD: Using Physically Based Differentiable Rendering to Probe Vision Models for 3D Scene Understanding Benjamin Beilharz et.al. 2512.12307 translate read null
2025-12-13 A Multi-Year Urban Streetlight Imagery Dataset for Visual Monitoring and Spatio-Temporal Drift Detection Peizheng Li et.al. 2512.12205 translate read null
2025-12-13 Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video Daniel Adebi et.al. 2512.12165 translate read null
2025-12-12 Evaluating Foundation Models’ 3D Understanding Through Multi-View Correspondence Analysis Valentina Lilova et.al. 2512.11574 translate read null
2025-12-12 Reconstruction as a Bridge for Event-Based Visual Question Answering Hanyue Lou et.al. 2512.11510 translate read null
2025-12-12 VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing Emanuel Sánchez Aimar et.al. 2512.11490 translate read null
2025-12-10 LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating Junting Chen et.al. 2512.09920 translate read null
2025-12-09 SIP: Site in Pieces- A Dataset of Disaggregated Construction-Phase 3D Scans for Semantic Segmentation and Scene Understanding Seongyong Kim et.al. 2512.09062 translate read null
2025-12-09 LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training Qing Xu et.al. 2512.08439 translate read null
2025-12-09 CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning Zeyuan Chen et.al. 2512.08135 translate read null
2025-12-08 SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery Meng Cao et.al. 2512.07733 translate read null
2025-12-08 STRinGS: Selective Text Refinement in Gaussian Splatting Abhinav Raundhal et.al. 2512.07230 translate read null
2025-12-08 A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning Siyang Jiang et.al. 2512.07136 translate read null
2025-12-05 Physics-Grounded Attached Shadow Detection Using Approximate 3D Geometry and Light Direction Shilin Hu et.al. 2512.06179 translate read null
2025-12-05 BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving Karthik Mohan et.al. 2512.06096 translate read null
2025-12-05 Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision Lennart Maack et.al. 2512.05740 translate read null
2025-12-05 Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction Ruihong Yin et.al. 2512.05597 translate read null
2025-12-05 VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation Chinthani Sugandhika et.al. 2512.05524 translate read null
2025-12-04 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer Xianfeng Wu et.al. 2512.05060 translate read null
2025-12-03 C3G: Learning Compact 3D Representations with 2K Gaussians Honggyu An et.al. 2512.04021 translate read null
2025-12-03 Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding Haoran Zhou et.al. 2512.03601 translate read null
2025-12-03 What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models Tianchen Deng et.al. 2512.03422 translate read null
2025-12-03 ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding Lingjun Zhao et.al. 2512.03370 translate read null
2025-12-02 SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding Hongpei Zheng et.al. 2512.03284 translate read null
2025-12-02 Layout Anything: One Transformer for Universal Room Layout Estimation Md Sohag Mia et.al. 2512.02952 translate read null
2025-12-02 Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding Yerim Jeon et.al. 2512.02487 translate read null
2025-12-02 HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild Valentin Bieri et.al. 2512.02450 translate read null
2025-12-01 ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation Chenyang Gu et.al. 2512.02013 translate read null
2025-12-01 OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic Songyan Zhang et.al. 2512.01830 translate read null
2025-12-01 IGen: Scalable Data Generation for Robot Learning from Open-World Images Chenghao Gu et.al. 2512.01773 translate read null
2025-12-01 SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge Yumeng He et.al. 2512.01629 translate read null
2025-12-01 MDiff4STR: Mask Diffusion Model for Scene Text Recognition Yongkun Du et.al. 2512.01422 translate read null
2025-12-01 VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering Zihua Liu et.al. 2512.01178 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)