Scene Understanding - 2025-08 | Paper Arxiv Daily

Scene Understanding - 2025-08

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-08-31	SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting	Zhuodong Jiang et.al.	2509.00800	translate	read	null
2025-08-31	OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving	Pei Liu et.al.	2509.00789	translate	read	null
2025-08-30	ConceptBot: Enhancing Robot’s Autonomy through Task Decomposition with Large Language Models and Knowledge Graph	Alessandro Leanza et.al.	2509.00570	translate	read	null
2025-08-29	Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment	Jinzhou Tang et.al.	2509.00210	translate	read	null
2025-08-18	2COOOL: 2nd Workshop on the Challenge Of Out-Of-Label Hazards in Autonomous Driving	Ali K. AlShami et.al.	2508.21080	translate	read	null
2025-08-27	Hyperspectral Sensors and Autonomous Driving: Technologies, Limitations, and Opportunities	Imad Ali Shah et.al.	2508.19905	translate	read	null
2025-08-27	Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots	Sena Ishii et.al.	2508.19788	translate	read	null
2025-08-27	LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation	Yupeng Zhang et.al.	2508.19699	translate	read	link
2025-08-27	Scalable Object Detection in the Car Interior With Vision Foundation Models	Bálint Mészáros et.al.	2508.19651	translate	read	null
2025-08-25	ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation	Jianwen Tan et.al.	2508.18050	translate	read	null
2025-08-25	HLG: Comprehensive 3D Room Construction via Hierarchical Layout Generation	Xiping Wang et.al.	2508.17832	translate	read	null
2025-08-24	Investigating Domain Gaps for Indoor 3D Object Detection	Zijing Zhao et.al.	2508.17439	translate	read	null
2025-08-24	An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing	Zihan Liang et.al.	2508.17435	translate	read	null
2025-08-24	SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality	Yuzhi Lai et.al.	2508.17255	translate	read	null
2025-08-24	Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding	Yunxiang Yang et.al.	2508.17205	translate	read	null
2025-08-23	PVNet: Point-Voxel Interaction LiDAR Scene Upsampling Via Diffusion Models	Xianjing Cheng et.al.	2508.17050	translate	read	null
2025-08-22	HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction	Sara Rojas et.al.	2508.16433	translate	read	null
2025-08-21	ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification	Bochao Sun et.al.	2508.15632	translate	read	null
2025-08-19	Hybrelighter: Combining Deep Anisotropic Diffusion and Scene Reconstruction for On-device Real-time Relighting in Mixed Reality	Hanwen Zhao et.al.	2508.14930	translate	read	null
2025-08-20	MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation	Guile Wu et.al.	2508.14327	translate	read	null
2025-08-19	GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting	Elena Alegret et.al.	2508.14278	translate	read	null
2025-08-19	ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving	Xianda Guo et.al.	2508.13977	translate	read	null
2025-08-19	Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference	Yunxiang Yang et.al.	2508.13439	translate	read	null
2025-08-17	PreSem-Surf: RGB-D Surface Reconstruction with Progressive Semantic Modeling and SG-MLP Pre-Rendering Mechanism	Yuyan Ye et.al.	2508.13228	translate	read	null
2025-08-17	LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving	Nan Song et.al.	2508.12404	translate	read	null
2025-08-17	Splat Feature Solver	Butian Xiong et.al.	2508.12216	translate	read	null
2025-08-16	InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes	Hongyuan Liu et.al.	2508.12015	translate	read	null
2025-08-14	Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset	Wentao Mo et.al.	2508.11058	translate	read	null
2025-08-13	Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation	Xu Tang et.al.	2508.09626	translate	read	null
2025-08-12	Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment	Shi-Chen Zhang et.al.	2508.08811	translate	read	null
2025-08-11	SAGOnline: Segment Any Gaussians Online	Wentao Sun et.al.	2508.08219	translate	read	null
2025-08-11	TrackOR: Towards Personalized Intelligent Operating Rooms Through Robust Tracking	Tony Danjun Wang et.al.	2508.07968	translate	read	null
2025-08-11	DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models	Licheng Zhang et.al.	2508.07714	translate	read	null
2025-08-10	Understanding Dynamic Scenes in Ego Centric 4D Point Clouds	Junsheng Huang et.al.	2508.07251	translate	read	null
2025-08-05	Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images	Qi Xun Yeo et.al.	2508.06546	translate	read	null
2025-08-07	VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments	Kaiser Hamid et.al.	2508.05852	translate	read	null
2025-08-07	Point cloud segmentation for 3D Clothed Human Layering	Davide Garavaso et.al.	2508.05531	translate	read	null
2025-08-07	EndoMatcher: Generalizable Endoscopic Image Matcher via Multi-Domain Pre-training for Robot-Assisted Surgery	Bingyu Yang et.al.	2508.05205	translate	read	null
2025-08-07	A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding	Mahmoud Chick Zaouali et.al.	2508.05064	translate	read	null
2025-08-07	TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring	Zhu Xu et.al.	2508.04943	translate	read	null
2025-08-06	PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment	Gustav Hanning et.al.	2508.04659	translate	read	null
2025-08-05	SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision	Zhaoxu Li et.al.	2508.03177	translate	read	null
2025-08-05	CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation	Lekang Wen et.al.	2508.03060	translate	read	null
2025-08-04	FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation	Cui Miao et.al.	2508.02190	translate	read	null
2025-08-04	GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting	Lei Yao et.al.	2508.02172	translate	read	null
2025-08-03	DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion	Zhigang Sun et.al.	2508.01778	translate	read	null
2025-08-03	AG $^2$ aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing	Zhaonan Wang et.al.	2508.01740	translate	read	null
2025-08-03	Dynamic Robot-Assisted Surgery with Hierarchical Class-Incremental Semantic Segmentation	Julia Hindel et.al.	2508.01713	translate	read	null
2025-08-02	TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition	Xiahan Yang et.al.	2508.01153	translate	read	null
2025-08-02	OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding	Dianyi Yang et.al.	2508.01150	translate	read	null
2025-08-01	3D Reconstruction via Incremental Structure From Motion	Muhammad Zeeshan et.al.	2508.01019	translate	read	null
2025-08-01	Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF	Massoud Pourmandi et.al.	2508.00967	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)