Scene Understanding - 2025-09

Publish Date Title Authors PDF Translate Read Code
2025-09-30 Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification Artur Barros et.al. 2509.26457 translate read null
2025-09-30 Neighbor-aware informal settlement mapping with graph convolutional networks Thomas Hallopeau et.al. 2509.26171 translate read null
2025-09-30 Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models Yuansen Liu et.al. 2509.26165 translate read null
2025-09-30 EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models Seamie Hayes et.al. 2509.26087 translate read null
2025-09-30 VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs Peng Liu et.al. 2509.25916 translate read null
2025-09-29 PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos Ting-Hsuan Liao et.al. 2509.25183 translate read null
2025-09-29 Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs Yue Zhang et.al. 2509.25139 translate read null
2025-09-29 Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots Ermanno Bartoli et.al. 2509.24966 translate read null
2025-09-29 CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D Mohamad Amin Mirzaei et.al. 2509.24528 translate read null
2025-09-29 PhysiAgent: An Embodied Agent Framework in Physical World Zhihao Wang et.al. 2509.24524 translate read null
2025-09-29 Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy Haijier Chen et.al. 2509.24385 translate read null
2025-09-29 Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context Yongqiang Wang et.al. 2509.24275 translate read null
2025-09-28 FUSAR-KLIP: Towards Multimodal Foundation Models for Remote Sensing Yi Yang et.al. 2509.23927 translate read null
2025-09-28 Uni4D-LLM: A Unified SpatioTemporal-Aware VLM for 4D Understanding and Generation Hanyu Zhou et.al. 2509.23828 translate read null
2025-09-28 From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving Yixiao Chen et.al. 2509.23641 translate read null
2025-09-28 From Fields to Splats: A Cross-Domain Survey of Real-Time Neural Scene Representations Javed Ahmad et.al. 2509.23555 translate read null
2025-09-26 Good Weights: Proactive, Adaptive Dead Reckoning Fusion for Continuous and Robust Visual SLAM Yanwei Du et.al. 2509.22910 translate read null
2025-09-20 Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment Abhiroop Chatterjee et.al. 2509.22697 translate read null
2025-09-26 UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective Jun He et.al. 2509.22228 translate read null
2025-09-26 Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics Saurav Jha et.al. 2509.22014 translate read null
2025-09-26 Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding Vahid Mirjalili et.al. 2509.21922 translate read null
2025-09-25 Real-Time Indoor Object SLAM with LLM-Enhanced Priors Yang Jiao et.al. 2509.21602 translate read null
2025-09-25 Residual Vector Quantization For Communication-Efficient Multi-Agent Perception Dereje Shenkut et.al. 2509.21464 translate read null
2025-09-23 TUN3D: Towards Real-World Scene Understanding from Unposed Images Anton Konushin et.al. 2509.21388 translate read link
2025-09-25 DENet: Dual-Path Edge Network with Global-Local Attention for Infrared Small Target Detection Jiayi Zuo et.al. 2509.20701 translate read null
2025-09-23 SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment Binod Singh et.al. 2509.20401 translate read null
2025-09-24 Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning Xun Li et.al. 2509.20077 translate read null
2025-09-24 OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving Pei Liu et.al. 2509.19973 translate read null
2025-09-23 Category-Level Object Shape and Pose Estimation in Less Than a Millisecond Lorenzo Shaikewitz et.al. 2509.18979 translate read null
2025-09-23 Eva-VLA: Evaluating Vision-Language-Action Models’ Robustness Under Real-World Physical Variations Hanqing Liu et.al. 2509.18953 translate read null
2025-09-23 Surgical Video Understanding with Label Interpolation Garam Kim et.al. 2509.18802 translate read null
2025-09-23 MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning Omar Rayyan et.al. 2509.18757 translate read null
2025-09-23 PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving Chengran Yuan et.al. 2509.18609 translate read null
2025-09-22 Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration Zhitao Zeng et.al. 2509.17429 translate read null
2025-09-20 Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding Haoyuan Li et.al. 2509.16721 translate read null
2025-09-20 ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting Xiaoyang Yan et.al. 2509.16552 translate read null
2025-09-19 Towards Sharper Object Boundaries in Self-Supervised Depth Estimation Aurélien Cecille et.al. 2509.15987 translate read null
2025-09-19 RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation Paul Julius Kühn et.al. 2509.15886 translate read null
2025-09-19 SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models Sen Wang et.al. 2509.15536 translate read null
2025-09-18 Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems Yicheng Zhang et.al. 2509.15213 translate read null
2025-09-18 SPATIALGEN: Layout-guided 3D Indoor Scene Generation Chuan Fang et.al. 2509.14981 translate read link
2025-09-16 Semantic 3D Reconstructions with SLAM for Central Airway Obstruction Ayberk Acar et.al. 2509.13541 translate read null
2025-09-16 ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors Romain Hardy et.al. 2509.13525 translate read null
2025-09-16 3D Aware Region Prompted Vision Language Model An-Chieh Cheng et.al. 2509.13317 translate read null
2025-09-16 Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving Ruibo Li et.al. 2509.13116 translate read null
2025-09-16 Beyond Averages: Open-Vocabulary 3D Scene Understanding with Gaussian Splatting and Bag of Embeddings Abdalla Arafa et.al. 2509.12938 translate read null
2025-09-16 MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization Yiyi Zhang et.al. 2509.12893 translate read null
2025-09-15 RailSafeNet: Visual Scene Understanding for Tram Safety Ondřej Valach et.al. 2509.12125 translate read link
2025-09-15 Microsurgical Instrument Segmentation for Robot-Assisted Surgery Tae Kyeong Jeong et.al. 2509.11727 translate read null
2025-09-15 See What I Mean? Mobile Eye-Perspective Rendering for Optical See-through Head-mounted Displays Gerlinde Emsenhuber et.al. 2509.11653 translate read null
2025-09-14 Modality-Aware Infrared and Visible Image Fusion with Target-Aware Supervision Tianyao Sun et.al. 2509.11476 translate read null
2025-09-14 DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation Yunheng Wang et.al. 2509.11197 translate read null
2025-09-14 3DAeroRelief: The first 3D Benchmark UAV Dataset for Post-Disaster Assessment Nhut Le et.al. 2509.11097 translate read null
2025-09-13 OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds Chongyu Wang et.al. 2509.10842 translate read null
2025-09-12 Multimodal SAM-adapter for Semantic Segmentation Iacopo Curti et.al. 2509.10408 translate read null
2025-09-10 SocialNav-SUB: Benchmarking VLMs for Scene Understanding in Social Robot Navigation Michael J. Munje et.al. 2509.08757 translate read null
2025-09-09 OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics Yinan Deng et.al. 2509.07500 translate read null
2025-09-09 DepthVision: Robust Vision-Language Understanding through GAN-Based LiDAR-to-RGB Synthesis Sven Kirchner et.al. 2509.07463 translate read null
2025-09-08 Synesthesia of Machines (SoM)-Aided LiDAR Point Cloud Transmission for Collaborative Perception Ensong Liu et.al. 2509.06506 translate read null
2025-09-07 UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning Huy Le et.al. 2509.06165 translate read null
2025-09-06 Depth-Aware Super-Resolution via Distance-Adaptive Variational Formulation Tianhao Guo et.al. 2509.05746 translate read null
2025-09-05 SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing Chaolei Wang et.al. 2509.05144 translate read null
2025-09-03 Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding Hongpei Zheng et.al. 2509.03635 translate read null
2025-09-03 Rashomon in the Streets: Explanation Ambiguity in Scene Understanding Helge Spieker et.al. 2509.03169 translate read null
2025-09-02 Generalizable Skill Learning for Construction Robots with Crowdsourced Natural Language Instructions, Composable Skills Standardization, and Large Language Model Hongrui Yu et.al. 2509.02876 translate read null
2025-09-02 SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images Pushpendra Dhakara et.al. 2509.02287 translate read null
2025-09-02 Omnidirectional Spatial Modeling from Correlated Panoramas Xinshen Zhang et.al. 2509.02164 translate read null
2025-09-02 AI-Driven Marine Robotics: Emerging Trends in Underwater Perception and Ecosystem Monitoring Scarlett Raine et.al. 2509.01878 translate read null
2025-09-01 Articulated Object Estimation in the Wild Abdelrhman Werby et.al. 2509.01708 translate read null
2025-09-01 Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation Maëlic Neau et.al. 2509.01209 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)