Scene Understanding - 2025-09 | Paper Arxiv Daily

Scene Understanding - 2025-09

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-09-30	Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification	Artur Barros et.al.	2509.26457	translate	read	null
2025-09-30	Neighbor-aware informal settlement mapping with graph convolutional networks	Thomas Hallopeau et.al.	2509.26171	translate	read	null
2025-09-30	Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models	Yuansen Liu et.al.	2509.26165	translate	read	null
2025-09-30	EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models	Seamie Hayes et.al.	2509.26087	translate	read	null
2025-09-30	VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs	Peng Liu et.al.	2509.25916	translate	read	null
2025-09-29	PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos	Ting-Hsuan Liao et.al.	2509.25183	translate	read	null
2025-09-29	Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs	Yue Zhang et.al.	2509.25139	translate	read	null
2025-09-29	Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots	Ermanno Bartoli et.al.	2509.24966	translate	read	null
2025-09-29	CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D	Mohamad Amin Mirzaei et.al.	2509.24528	translate	read	null
2025-09-29	PhysiAgent: An Embodied Agent Framework in Physical World	Zhihao Wang et.al.	2509.24524	translate	read	null
2025-09-29	Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy	Haijier Chen et.al.	2509.24385	translate	read	null
2025-09-29	Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context	Yongqiang Wang et.al.	2509.24275	translate	read	null
2025-09-28	FUSAR-KLIP: Towards Multimodal Foundation Models for Remote Sensing	Yi Yang et.al.	2509.23927	translate	read	null
2025-09-28	Uni4D-LLM: A Unified SpatioTemporal-Aware VLM for 4D Understanding and Generation	Hanyu Zhou et.al.	2509.23828	translate	read	null
2025-09-28	From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving	Yixiao Chen et.al.	2509.23641	translate	read	null
2025-09-28	From Fields to Splats: A Cross-Domain Survey of Real-Time Neural Scene Representations	Javed Ahmad et.al.	2509.23555	translate	read	null
2025-09-26	Good Weights: Proactive, Adaptive Dead Reckoning Fusion for Continuous and Robust Visual SLAM	Yanwei Du et.al.	2509.22910	translate	read	null
2025-09-20	Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment	Abhiroop Chatterjee et.al.	2509.22697	translate	read	null
2025-09-26	UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective	Jun He et.al.	2509.22228	translate	read	null
2025-09-26	Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics	Saurav Jha et.al.	2509.22014	translate	read	null
2025-09-26	Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding	Vahid Mirjalili et.al.	2509.21922	translate	read	null
2025-09-25	Real-Time Indoor Object SLAM with LLM-Enhanced Priors	Yang Jiao et.al.	2509.21602	translate	read	null
2025-09-25	Residual Vector Quantization For Communication-Efficient Multi-Agent Perception	Dereje Shenkut et.al.	2509.21464	translate	read	null
2025-09-23	TUN3D: Towards Real-World Scene Understanding from Unposed Images	Anton Konushin et.al.	2509.21388	translate	read	link
2025-09-25	DENet: Dual-Path Edge Network with Global-Local Attention for Infrared Small Target Detection	Jiayi Zuo et.al.	2509.20701	translate	read	null
2025-09-23	SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment	Binod Singh et.al.	2509.20401	translate	read	null
2025-09-24	Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning	Xun Li et.al.	2509.20077	translate	read	null
2025-09-24	OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving	Pei Liu et.al.	2509.19973	translate	read	null
2025-09-23	Category-Level Object Shape and Pose Estimation in Less Than a Millisecond	Lorenzo Shaikewitz et.al.	2509.18979	translate	read	null
2025-09-23	Eva-VLA: Evaluating Vision-Language-Action Models’ Robustness Under Real-World Physical Variations	Hanqing Liu et.al.	2509.18953	translate	read	null
2025-09-23	Surgical Video Understanding with Label Interpolation	Garam Kim et.al.	2509.18802	translate	read	null
2025-09-23	MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning	Omar Rayyan et.al.	2509.18757	translate	read	null
2025-09-23	PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving	Chengran Yuan et.al.	2509.18609	translate	read	null
2025-09-22	Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration	Zhitao Zeng et.al.	2509.17429	translate	read	null
2025-09-20	Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding	Haoyuan Li et.al.	2509.16721	translate	read	null
2025-09-20	ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting	Xiaoyang Yan et.al.	2509.16552	translate	read	null
2025-09-19	Towards Sharper Object Boundaries in Self-Supervised Depth Estimation	Aurélien Cecille et.al.	2509.15987	translate	read	null
2025-09-19	RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation	Paul Julius Kühn et.al.	2509.15886	translate	read	null
2025-09-19	SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models	Sen Wang et.al.	2509.15536	translate	read	null
2025-09-18	Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems	Yicheng Zhang et.al.	2509.15213	translate	read	null
2025-09-18	SPATIALGEN: Layout-guided 3D Indoor Scene Generation	Chuan Fang et.al.	2509.14981	translate	read	link
2025-09-16	Semantic 3D Reconstructions with SLAM for Central Airway Obstruction	Ayberk Acar et.al.	2509.13541	translate	read	null
2025-09-16	ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors	Romain Hardy et.al.	2509.13525	translate	read	null
2025-09-16	3D Aware Region Prompted Vision Language Model	An-Chieh Cheng et.al.	2509.13317	translate	read	null
2025-09-16	Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving	Ruibo Li et.al.	2509.13116	translate	read	null
2025-09-16	Beyond Averages: Open-Vocabulary 3D Scene Understanding with Gaussian Splatting and Bag of Embeddings	Abdalla Arafa et.al.	2509.12938	translate	read	null
2025-09-16	MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization	Yiyi Zhang et.al.	2509.12893	translate	read	null
2025-09-15	RailSafeNet: Visual Scene Understanding for Tram Safety	Ondřej Valach et.al.	2509.12125	translate	read	link
2025-09-15	Microsurgical Instrument Segmentation for Robot-Assisted Surgery	Tae Kyeong Jeong et.al.	2509.11727	translate	read	null
2025-09-15	See What I Mean? Mobile Eye-Perspective Rendering for Optical See-through Head-mounted Displays	Gerlinde Emsenhuber et.al.	2509.11653	translate	read	null
2025-09-14	Modality-Aware Infrared and Visible Image Fusion with Target-Aware Supervision	Tianyao Sun et.al.	2509.11476	translate	read	null
2025-09-14	DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation	Yunheng Wang et.al.	2509.11197	translate	read	null
2025-09-14	3DAeroRelief: The first 3D Benchmark UAV Dataset for Post-Disaster Assessment	Nhut Le et.al.	2509.11097	translate	read	null
2025-09-13	OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds	Chongyu Wang et.al.	2509.10842	translate	read	null
2025-09-12	Multimodal SAM-adapter for Semantic Segmentation	Iacopo Curti et.al.	2509.10408	translate	read	null
2025-09-10	SocialNav-SUB: Benchmarking VLMs for Scene Understanding in Social Robot Navigation	Michael J. Munje et.al.	2509.08757	translate	read	null
2025-09-09	OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics	Yinan Deng et.al.	2509.07500	translate	read	null
2025-09-09	DepthVision: Robust Vision-Language Understanding through GAN-Based LiDAR-to-RGB Synthesis	Sven Kirchner et.al.	2509.07463	translate	read	null
2025-09-08	Synesthesia of Machines (SoM)-Aided LiDAR Point Cloud Transmission for Collaborative Perception	Ensong Liu et.al.	2509.06506	translate	read	null
2025-09-07	UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning	Huy Le et.al.	2509.06165	translate	read	null
2025-09-06	Depth-Aware Super-Resolution via Distance-Adaptive Variational Formulation	Tianhao Guo et.al.	2509.05746	translate	read	null
2025-09-05	SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing	Chaolei Wang et.al.	2509.05144	translate	read	null
2025-09-03	Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding	Hongpei Zheng et.al.	2509.03635	translate	read	null
2025-09-03	Rashomon in the Streets: Explanation Ambiguity in Scene Understanding	Helge Spieker et.al.	2509.03169	translate	read	null
2025-09-02	Generalizable Skill Learning for Construction Robots with Crowdsourced Natural Language Instructions, Composable Skills Standardization, and Large Language Model	Hongrui Yu et.al.	2509.02876	translate	read	null
2025-09-02	SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images	Pushpendra Dhakara et.al.	2509.02287	translate	read	null
2025-09-02	Omnidirectional Spatial Modeling from Correlated Panoramas	Xinshen Zhang et.al.	2509.02164	translate	read	null
2025-09-02	AI-Driven Marine Robotics: Emerging Trends in Underwater Perception and Ecosystem Monitoring	Scarlett Raine et.al.	2509.01878	translate	read	null
2025-09-01	Articulated Object Estimation in the Wild	Abdelrhman Werby et.al.	2509.01708	translate	read	null
2025-09-01	Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation	Maëlic Neau et.al.	2509.01209	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)