Scene Understanding - 2025-12 | Paper Arxiv Daily

Scene Understanding - 2025-12

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-12-31	Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark	Pan Wang et.al.	2601.00092	translate	read	null
2025-12-31	UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning	Ankit Dhiman et.al.	2512.24763	translate	read	null
2025-12-31	3D Semantic Segmentation for Post-Disaster Assessment	Nhut Le et.al.	2512.24593	translate	read	null
2025-12-30	Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models	Kim Alexander Christensen et.al.	2512.24470	translate	read	null
2025-12-30	Spatial-aware Vision Language Model for Autonomous Driving	Weijie Wei et.al.	2512.24331	translate	read	null
2025-12-25	Break Out the Silverware – Semantic Understanding of Stored Household Items	Michaela Levi-Richter et.al.	2512.23739	translate	read	null
2025-12-29	Multi-label Classification with Panoptic Context Aggregation Networks	Mingyuan Jiu et.al.	2512.23486	translate	read	null
2025-12-29	SpatialMosaic: A Multiview VLM Dataset for Partial Visibility	Kanghee Lee et.al.	2512.23365	translate	read	null
2025-12-29	AVOID: The Adverse Visual Conditions Dataset with Obstacles for Driving Scene Understanding	Jongoh Jeong et.al.	2512.23215	translate	read	null
2025-12-29	GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation	Tianchen Deng et.al.	2512.23180	translate	read	null
2025-12-28	ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving	Qihang Peng et.al.	2512.22939	translate	read	null
2025-12-28	Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting	Yiqian Li et.al.	2512.22771	translate	read	null
2025-12-27	Instance Communication System for Intelligent Connected Vehicles: Bridging the Gap from Semantic to Instance-Level Transmission	Daiqi Zhang et.al.	2512.22693	translate	read	null
2025-12-26	VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement	Zhengfei Kuang et.al.	2512.22351	translate	read	null
2025-12-24	Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential	Shihao Zou et.al.	2512.21284	translate	read	null
2025-12-23	OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective	Markus Gross et.al.	2512.20770	translate	read	null
2025-12-22	CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models	Pengyu Chen et.al.	2512.19083	translate	read	null
2025-12-22	VOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene Completion	Zaidao Han et.al.	2512.18954	translate	read	null
2025-12-21	Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes	Jingjia Teng et.al.	2512.18836	translate	read	null
2025-12-20	LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning	Yudong Liu et.al.	2512.18211	translate	read	null
2025-12-19	InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion	Hoiyeong Jin et.al.	2512.17504	translate	read	null
2025-12-18	MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning	Yuanchen Ju et.al.	2512.16909	translate	read	null
2025-12-18	SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning	Tin Stribor Sohn et.al.	2512.16461	translate	read	null
2025-12-18	Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception	Bangya Liu et.al.	2512.16265	translate	read	null
2025-12-16	Unified Semantic Transformer for 3D Scene Understanding	Sebastian Koch et.al.	2512.14364	translate	read	null
2025-12-16	Consistent Instance Field for Dynamic Scene Understanding	Junyi Wu et.al.	2512.14126	translate	read	null
2025-12-16	Deep Learning Perspective of Scene Understanding in Autonomous Robots	Afia Maham et.al.	2512.14020	translate	read	null
2025-12-15	I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners	Lu Ling et.al.	2512.13683	translate	read	null
2025-12-15	MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion	Minghui Hou et.al.	2512.13177	translate	read	null
2025-12-15	DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass	Vivek Alumootil et.al.	2512.13122	translate	read	null
2025-12-15	SLIM-VDB: A Real-Time 3D Probabilistic Semantic Mapping Framework	Anja Sheppard et.al.	2512.12945	translate	read	null
2025-12-13	INDOOR-LiDAR: Bridging Simulation and Reality for Robot-Centric 360 degree Indoor LiDAR Perception – A Robot-Centric Hybrid Dataset	Haichuan Li et.al.	2512.12377	translate	read	null
2025-12-13	MRD: Using Physically Based Differentiable Rendering to Probe Vision Models for 3D Scene Understanding	Benjamin Beilharz et.al.	2512.12307	translate	read	null
2025-12-13	A Multi-Year Urban Streetlight Imagery Dataset for Visual Monitoring and Spatio-Temporal Drift Detection	Peizheng Li et.al.	2512.12205	translate	read	null
2025-12-13	Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video	Daniel Adebi et.al.	2512.12165	translate	read	null
2025-12-12	Evaluating Foundation Models’ 3D Understanding Through Multi-View Correspondence Analysis	Valentina Lilova et.al.	2512.11574	translate	read	null
2025-12-12	Reconstruction as a Bridge for Event-Based Visual Question Answering	Hanyue Lou et.al.	2512.11510	translate	read	null
2025-12-12	VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing	Emanuel Sánchez Aimar et.al.	2512.11490	translate	read	null
2025-12-10	LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating	Junting Chen et.al.	2512.09920	translate	read	null
2025-12-09	SIP: Site in Pieces- A Dataset of Disaggregated Construction-Phase 3D Scans for Semantic Segmentation and Scene Understanding	Seongyong Kim et.al.	2512.09062	translate	read	null
2025-12-09	LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training	Qing Xu et.al.	2512.08439	translate	read	null
2025-12-09	CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning	Zeyuan Chen et.al.	2512.08135	translate	read	null
2025-12-08	SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery	Meng Cao et.al.	2512.07733	translate	read	null
2025-12-08	STRinGS: Selective Text Refinement in Gaussian Splatting	Abhinav Raundhal et.al.	2512.07230	translate	read	null
2025-12-08	A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning	Siyang Jiang et.al.	2512.07136	translate	read	null
2025-12-05	Physics-Grounded Attached Shadow Detection Using Approximate 3D Geometry and Light Direction	Shilin Hu et.al.	2512.06179	translate	read	null
2025-12-05	BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving	Karthik Mohan et.al.	2512.06096	translate	read	null
2025-12-05	Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision	Lennart Maack et.al.	2512.05740	translate	read	null
2025-12-05	Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction	Ruihong Yin et.al.	2512.05597	translate	read	null
2025-12-05	VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation	Chinthani Sugandhika et.al.	2512.05524	translate	read	null
2025-12-04	4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer	Xianfeng Wu et.al.	2512.05060	translate	read	null
2025-12-03	C3G: Learning Compact 3D Representations with 2K Gaussians	Honggyu An et.al.	2512.04021	translate	read	null
2025-12-03	Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding	Haoran Zhou et.al.	2512.03601	translate	read	null
2025-12-03	What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models	Tianchen Deng et.al.	2512.03422	translate	read	null
2025-12-03	ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding	Lingjun Zhao et.al.	2512.03370	translate	read	null
2025-12-02	SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding	Hongpei Zheng et.al.	2512.03284	translate	read	null
2025-12-02	Layout Anything: One Transformer for Universal Room Layout Estimation	Md Sohag Mia et.al.	2512.02952	translate	read	null
2025-12-02	Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding	Yerim Jeon et.al.	2512.02487	translate	read	null
2025-12-02	HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild	Valentin Bieri et.al.	2512.02450	translate	read	null
2025-12-01	ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation	Chenyang Gu et.al.	2512.02013	translate	read	null
2025-12-01	OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic	Songyan Zhang et.al.	2512.01830	translate	read	null
2025-12-01	IGen: Scalable Data Generation for Robot Learning from Open-World Images	Chenghao Gu et.al.	2512.01773	translate	read	null
2025-12-01	SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge	Yumeng He et.al.	2512.01629	translate	read	null
2025-12-01	MDiff4STR: Mask Diffusion Model for Scene Text Recognition	Yongkun Du et.al.	2512.01422	translate	read	null
2025-12-01	VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering	Zihua Liu et.al.	2512.01178	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)