Scene Understanding - 2025-08
Scene Understanding - 2025-08
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-08-31 | SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting | Zhuodong Jiang et.al. | 2509.00800 | translate | read | null |
| 2025-08-31 | OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving | Pei Liu et.al. | 2509.00789 | translate | read | null |
| 2025-08-30 | ConceptBot: Enhancing Robot’s Autonomy through Task Decomposition with Large Language Models and Knowledge Graph | Alessandro Leanza et.al. | 2509.00570 | translate | read | null |
| 2025-08-29 | Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment | Jinzhou Tang et.al. | 2509.00210 | translate | read | null |
| 2025-08-18 | 2COOOL: 2nd Workshop on the Challenge Of Out-Of-Label Hazards in Autonomous Driving | Ali K. AlShami et.al. | 2508.21080 | translate | read | null |
| 2025-08-27 | Hyperspectral Sensors and Autonomous Driving: Technologies, Limitations, and Opportunities | Imad Ali Shah et.al. | 2508.19905 | translate | read | null |
| 2025-08-27 | Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots | Sena Ishii et.al. | 2508.19788 | translate | read | null |
| 2025-08-27 | LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation | Yupeng Zhang et.al. | 2508.19699 | translate | read | link |
| 2025-08-27 | Scalable Object Detection in the Car Interior With Vision Foundation Models | Bálint Mészáros et.al. | 2508.19651 | translate | read | null |
| 2025-08-25 | ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation | Jianwen Tan et.al. | 2508.18050 | translate | read | null |
| 2025-08-25 | HLG: Comprehensive 3D Room Construction via Hierarchical Layout Generation | Xiping Wang et.al. | 2508.17832 | translate | read | null |
| 2025-08-24 | Investigating Domain Gaps for Indoor 3D Object Detection | Zijing Zhao et.al. | 2508.17439 | translate | read | null |
| 2025-08-24 | An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing | Zihan Liang et.al. | 2508.17435 | translate | read | null |
| 2025-08-24 | SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality | Yuzhi Lai et.al. | 2508.17255 | translate | read | null |
| 2025-08-24 | Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding | Yunxiang Yang et.al. | 2508.17205 | translate | read | null |
| 2025-08-23 | PVNet: Point-Voxel Interaction LiDAR Scene Upsampling Via Diffusion Models | Xianjing Cheng et.al. | 2508.17050 | translate | read | null |
| 2025-08-22 | HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction | Sara Rojas et.al. | 2508.16433 | translate | read | null |
| 2025-08-21 | ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification | Bochao Sun et.al. | 2508.15632 | translate | read | null |
| 2025-08-19 | Hybrelighter: Combining Deep Anisotropic Diffusion and Scene Reconstruction for On-device Real-time Relighting in Mixed Reality | Hanwen Zhao et.al. | 2508.14930 | translate | read | null |
| 2025-08-20 | MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation | Guile Wu et.al. | 2508.14327 | translate | read | null |
| 2025-08-19 | GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting | Elena Alegret et.al. | 2508.14278 | translate | read | null |
| 2025-08-19 | ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving | Xianda Guo et.al. | 2508.13977 | translate | read | null |
| 2025-08-19 | Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference | Yunxiang Yang et.al. | 2508.13439 | translate | read | null |
| 2025-08-17 | PreSem-Surf: RGB-D Surface Reconstruction with Progressive Semantic Modeling and SG-MLP Pre-Rendering Mechanism | Yuyan Ye et.al. | 2508.13228 | translate | read | null |
| 2025-08-17 | LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving | Nan Song et.al. | 2508.12404 | translate | read | null |
| 2025-08-17 | Splat Feature Solver | Butian Xiong et.al. | 2508.12216 | translate | read | null |
| 2025-08-16 | InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes | Hongyuan Liu et.al. | 2508.12015 | translate | read | null |
| 2025-08-14 | Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset | Wentao Mo et.al. | 2508.11058 | translate | read | null |
| 2025-08-13 | Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation | Xu Tang et.al. | 2508.09626 | translate | read | null |
| 2025-08-12 | Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment | Shi-Chen Zhang et.al. | 2508.08811 | translate | read | null |
| 2025-08-11 | SAGOnline: Segment Any Gaussians Online | Wentao Sun et.al. | 2508.08219 | translate | read | null |
| 2025-08-11 | TrackOR: Towards Personalized Intelligent Operating Rooms Through Robust Tracking | Tony Danjun Wang et.al. | 2508.07968 | translate | read | null |
| 2025-08-11 | DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models | Licheng Zhang et.al. | 2508.07714 | translate | read | null |
| 2025-08-10 | Understanding Dynamic Scenes in Ego Centric 4D Point Clouds | Junsheng Huang et.al. | 2508.07251 | translate | read | null |
| 2025-08-05 | Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images | Qi Xun Yeo et.al. | 2508.06546 | translate | read | null |
| 2025-08-07 | VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments | Kaiser Hamid et.al. | 2508.05852 | translate | read | null |
| 2025-08-07 | Point cloud segmentation for 3D Clothed Human Layering | Davide Garavaso et.al. | 2508.05531 | translate | read | null |
| 2025-08-07 | EndoMatcher: Generalizable Endoscopic Image Matcher via Multi-Domain Pre-training for Robot-Assisted Surgery | Bingyu Yang et.al. | 2508.05205 | translate | read | null |
| 2025-08-07 | A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding | Mahmoud Chick Zaouali et.al. | 2508.05064 | translate | read | null |
| 2025-08-07 | TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring | Zhu Xu et.al. | 2508.04943 | translate | read | null |
| 2025-08-06 | PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment | Gustav Hanning et.al. | 2508.04659 | translate | read | null |
| 2025-08-05 | SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision | Zhaoxu Li et.al. | 2508.03177 | translate | read | null |
| 2025-08-05 | CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation | Lekang Wen et.al. | 2508.03060 | translate | read | null |
| 2025-08-04 | FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation | Cui Miao et.al. | 2508.02190 | translate | read | null |
| 2025-08-04 | GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting | Lei Yao et.al. | 2508.02172 | translate | read | null |
| 2025-08-03 | DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion | Zhigang Sun et.al. | 2508.01778 | translate | read | null |
| 2025-08-03 | AG $^2$ aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing | Zhaonan Wang et.al. | 2508.01740 | translate | read | null |
| 2025-08-03 | Dynamic Robot-Assisted Surgery with Hierarchical Class-Incremental Semantic Segmentation | Julia Hindel et.al. | 2508.01713 | translate | read | null |
| 2025-08-02 | TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition | Xiahan Yang et.al. | 2508.01153 | translate | read | null |
| 2025-08-02 | OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding | Dianyi Yang et.al. | 2508.01150 | translate | read | null |
| 2025-08-01 | 3D Reconstruction via Incremental Structure From Motion | Muhammad Zeeshan et.al. | 2508.01019 | translate | read | null |
| 2025-08-01 | Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF | Massoud Pourmandi et.al. | 2508.00967 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)