Scene Understanding - 2025-08

Publish Date Title Authors PDF Translate Read Code
2025-08-31 SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting Zhuodong Jiang et.al. 2509.00800 translate read null
2025-08-31 OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving Pei Liu et.al. 2509.00789 translate read null
2025-08-30 ConceptBot: Enhancing Robot’s Autonomy through Task Decomposition with Large Language Models and Knowledge Graph Alessandro Leanza et.al. 2509.00570 translate read null
2025-08-29 Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment Jinzhou Tang et.al. 2509.00210 translate read null
2025-08-18 2COOOL: 2nd Workshop on the Challenge Of Out-Of-Label Hazards in Autonomous Driving Ali K. AlShami et.al. 2508.21080 translate read null
2025-08-27 Hyperspectral Sensors and Autonomous Driving: Technologies, Limitations, and Opportunities Imad Ali Shah et.al. 2508.19905 translate read null
2025-08-27 Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots Sena Ishii et.al. 2508.19788 translate read null
2025-08-27 LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation Yupeng Zhang et.al. 2508.19699 translate read link
2025-08-27 Scalable Object Detection in the Car Interior With Vision Foundation Models Bálint Mészáros et.al. 2508.19651 translate read null
2025-08-25 ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation Jianwen Tan et.al. 2508.18050 translate read null
2025-08-25 HLG: Comprehensive 3D Room Construction via Hierarchical Layout Generation Xiping Wang et.al. 2508.17832 translate read null
2025-08-24 Investigating Domain Gaps for Indoor 3D Object Detection Zijing Zhao et.al. 2508.17439 translate read null
2025-08-24 An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing Zihan Liang et.al. 2508.17435 translate read null
2025-08-24 SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality Yuzhi Lai et.al. 2508.17255 translate read null
2025-08-24 Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding Yunxiang Yang et.al. 2508.17205 translate read null
2025-08-23 PVNet: Point-Voxel Interaction LiDAR Scene Upsampling Via Diffusion Models Xianjing Cheng et.al. 2508.17050 translate read null
2025-08-22 HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction Sara Rojas et.al. 2508.16433 translate read null
2025-08-21 ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification Bochao Sun et.al. 2508.15632 translate read null
2025-08-19 Hybrelighter: Combining Deep Anisotropic Diffusion and Scene Reconstruction for On-device Real-time Relighting in Mixed Reality Hanwen Zhao et.al. 2508.14930 translate read null
2025-08-20 MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation Guile Wu et.al. 2508.14327 translate read null
2025-08-19 GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting Elena Alegret et.al. 2508.14278 translate read null
2025-08-19 ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving Xianda Guo et.al. 2508.13977 translate read null
2025-08-19 Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference Yunxiang Yang et.al. 2508.13439 translate read null
2025-08-17 PreSem-Surf: RGB-D Surface Reconstruction with Progressive Semantic Modeling and SG-MLP Pre-Rendering Mechanism Yuyan Ye et.al. 2508.13228 translate read null
2025-08-17 LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving Nan Song et.al. 2508.12404 translate read null
2025-08-17 Splat Feature Solver Butian Xiong et.al. 2508.12216 translate read null
2025-08-16 InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes Hongyuan Liu et.al. 2508.12015 translate read null
2025-08-14 Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset Wentao Mo et.al. 2508.11058 translate read null
2025-08-13 Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation Xu Tang et.al. 2508.09626 translate read null
2025-08-12 Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment Shi-Chen Zhang et.al. 2508.08811 translate read null
2025-08-11 SAGOnline: Segment Any Gaussians Online Wentao Sun et.al. 2508.08219 translate read null
2025-08-11 TrackOR: Towards Personalized Intelligent Operating Rooms Through Robust Tracking Tony Danjun Wang et.al. 2508.07968 translate read null
2025-08-11 DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models Licheng Zhang et.al. 2508.07714 translate read null
2025-08-10 Understanding Dynamic Scenes in Ego Centric 4D Point Clouds Junsheng Huang et.al. 2508.07251 translate read null
2025-08-05 Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images Qi Xun Yeo et.al. 2508.06546 translate read null
2025-08-07 VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments Kaiser Hamid et.al. 2508.05852 translate read null
2025-08-07 Point cloud segmentation for 3D Clothed Human Layering Davide Garavaso et.al. 2508.05531 translate read null
2025-08-07 EndoMatcher: Generalizable Endoscopic Image Matcher via Multi-Domain Pre-training for Robot-Assisted Surgery Bingyu Yang et.al. 2508.05205 translate read null
2025-08-07 A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding Mahmoud Chick Zaouali et.al. 2508.05064 translate read null
2025-08-07 TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring Zhu Xu et.al. 2508.04943 translate read null
2025-08-06 PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment Gustav Hanning et.al. 2508.04659 translate read null
2025-08-05 SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision Zhaoxu Li et.al. 2508.03177 translate read null
2025-08-05 CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation Lekang Wen et.al. 2508.03060 translate read null
2025-08-04 FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation Cui Miao et.al. 2508.02190 translate read null
2025-08-04 GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting Lei Yao et.al. 2508.02172 translate read null
2025-08-03 DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion Zhigang Sun et.al. 2508.01778 translate read null
2025-08-03 AG $^2$ aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing Zhaonan Wang et.al. 2508.01740 translate read null
2025-08-03 Dynamic Robot-Assisted Surgery with Hierarchical Class-Incremental Semantic Segmentation Julia Hindel et.al. 2508.01713 translate read null
2025-08-02 TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition Xiahan Yang et.al. 2508.01153 translate read null
2025-08-02 OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding Dianyi Yang et.al. 2508.01150 translate read null
2025-08-01 3D Reconstruction via Incremental Structure From Motion Muhammad Zeeshan et.al. 2508.01019 translate read null
2025-08-01 Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF Massoud Pourmandi et.al. 2508.00967 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)