Scene Understanding - 2025-07

Publish Date Title Authors PDF Translate Read Code
2025-07-31 Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs Bhavya Goyal et.al. 2508.00169 translate read null
2025-07-31 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding Ting Huang et.al. 2507.23478 translate read null
2025-07-31 FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models Yiming Yang et.al. 2507.23325 translate read null
2025-07-31 FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning Jiajun Cao et.al. 2507.23318 translate read null
2025-07-30 DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion Qingcheng Zhao et.al. 2507.22825 translate read null
2025-07-30 UAVScenes: A Multi-Modal Dataset for UAVs Sijie Wang et.al. 2507.22412 translate read null
2025-07-29 EIFNet: Leveraging Event-Image Fusion for Robust Semantic Segmentation Zhijiang Li et.al. 2507.21971 translate read null
2025-07-28 GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction Tianhao Li et.al. 2507.20963 translate read null
2025-07-28 Compositional Video Synthesis by Temporal Object-Centric Learning Adil Kaan Akan et.al. 2507.20855 translate read null
2025-07-27 VESPA: Towards un(Human)supervised Open-World Pointcloud Labeling for Autonomous Driving Levente Tempfli et.al. 2507.20397 translate read null
2025-07-27 Solving Scene Understanding for Autonomous Navigation in Unstructured Environments Naveen Mathews Renji et.al. 2507.20389 translate read null
2025-07-26 FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images Hao-Yu Hou et.al. 2507.19993 translate read null
2025-07-26 UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block Luoxi Jing et.al. 2507.19948 translate read null
2025-07-26 RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection Xiaokai Bai et.al. 2507.19856 translate read null
2025-07-26 Taking Language Embedded 3D Gaussian Splatting into the Wild Yuze Wang et.al. 2507.19830 translate read null
2025-07-25 Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing Haichuan Li et.al. 2507.19691 translate read null
2025-07-25 VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions Haoang Lu et.al. 2507.19188 translate read null
2025-07-24 Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting Xingyu Miao et.al. 2507.18678 translate read null
2025-07-23 From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding Anna-Maria Halacheva et.al. 2507.17585 translate read null
2025-07-23 IndoorBEV: Joint Detection and Footprint Completion of Objects via Mask-based Prediction in Indoor Scenarios for Bird’s-Eye View Perception Haichuan Li et.al. 2507.17445 translate read null
2025-07-22 ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension Yizhi Hu et.al. 2507.16877 translate read null
2025-07-22 Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge Tobias Rueckert et.al. 2507.16559 translate read null
2025-07-22 Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach Jon Gutiérrez-Zaballa et.al. 2507.16556 translate read null
2025-07-22 DenseSR: Image Shadow Removal as Dense Prediction Yu-Fan Lin et.al. 2507.16472 translate read link
2025-07-21 Label tree semantic losses for rich multi-class medical image segmentation Junwen Wang et.al. 2507.15777 translate read null
2025-07-21 Towards Holistic Surgical Scene Graph Jongmin Shin et.al. 2507.15541 translate read null
2025-07-21 ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting Ruijie Zhu et.al. 2507.15454 translate read link
2025-07-21 VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving Haichao Liu et.al. 2507.15266 translate read null
2025-07-19 DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF Doriand Petit et.al. 2507.14596 translate read null
2025-07-19 Descrip3D: Enhancing Large Language Model-based 3D Scene Understanding with Object-Level Text Descriptions Jintang Xue et.al. 2507.14555 translate read null
2025-07-19 Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025 Sujata Gaihre et.al. 2507.14544 translate read null
2025-07-19 CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding Zhou Chen et.al. 2507.14426 translate read null
2025-07-18 Semantic Segmentation based Scene Understanding in Autonomous Vehicles Ehsan Rassekh et.al. 2507.14303 translate read null
2025-07-18 Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation Masahiro Ogawa et.al. 2507.13628 translate read null
2025-07-17 Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection Jingyao Wang et.al. 2507.13061 translate read null
2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models Yifan Xu et.al. 2507.12916 translate read null
2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Penglei Sun et.al. 2507.12795 translate read null
2025-07-16 Funnel-HOI: Top-Down Perception for Zero-Shot HOI Detection Sandipan Sarma et.al. 2507.12628 translate read null
2025-07-15 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis Maciej Szankin et.al. 2507.11730 translate read null
2025-07-15 Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander Li Wang et.al. 2507.11079 translate read null
2025-07-15 Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation Yanbo Wang et.al. 2507.11001 translate read null
2025-07-14 Static or Temporal? Semantic Scene Simplification to Aid Wayfinding in Immersive Simulations of Bionic Vision Justin M. Kasowski et.al. 2507.10813 translate read null
2025-07-14 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Mingxian Lin et.al. 2507.10548 translate read link
2025-07-13 VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding Younggun Kim et.al. 2507.09815 translate read null
2025-07-13 Self-supervised Pretraining for Integrated Prediction and Planning of Automated Vehicles Yangang Ren et.al. 2507.09537 translate read null
2025-07-12 Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding Wencan Huang et.al. 2507.09334 translate read null
2025-07-12 THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage Trong-Thuan Nguyen et.al. 2507.09200 translate read null
2025-07-12 Towards Spatial Audio Understanding via Question Answering Parthasaarathy Sudarsanam et.al. 2507.09195 translate read null
2025-07-12 On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving Md Hasan Shahriar et.al. 2507.09095 translate read null
2025-07-10 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding JingLi Lin et.al. 2507.07984 translate read link
2025-07-10 MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation Bangning Wei et.al. 2507.07519 translate read null
2025-07-09 SemRaFiner: Panoptic Segmentation in Sparse and Noisy Radar Point Clouds Matthias Zeller et.al. 2507.06906 translate read null
2025-07-09 Token Bottleneck: One Token to Remember Dynamics Taekyung Kim et.al. 2507.06543 translate read link
2025-07-09 What Demands Attention in Urban Street Scenes? From Scene Understanding towards Road Safety: A Survey of Vision-driven Datasets and Studies Yaoqi Huang et.al. 2507.06513 translate read null
2025-07-08 Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Aleksandar Jevtić et.al. 2507.06230 translate read link
2025-07-08 SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning Xin Hu et.al. 2507.05798 translate read null
2025-07-07 All in One: Visual-Description-Guided Unified Point Cloud Segmentation Zongyan Han et.al. 2507.05211 translate read null
2025-07-07 MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding Jing Liang et.al. 2507.04686 translate read null
2025-07-05 Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation Ziyu Zhu et.al. 2507.04047 translate read null
2025-07-05 Habitat Classification from Ground-Level Imagery Using Deep Neural Networks Hongrui Shi et.al. 2507.04017 translate read null
2025-07-04 Radar Velocity Transformer: Single-scan Moving Object Segmentation in Noisy Radar Point Clouds Matthias Zeller et.al. 2507.03463 translate read null
2025-07-03 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans Zhening Huang et.al. 2507.02861 translate read link
2025-07-03 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Fangfu Liu et.al. 2507.02813 translate read link
2025-07-03 SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment Qi Xu et.al. 2507.02705 translate read link
2025-07-04 Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach Elena Ryumina et.al. 2507.02205 translate read link
2025-07-02 ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning Xiao Wang et.al. 2507.02200 translate read null
2025-07-02 ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving Kai Chen et.al. 2507.01735 translate read null
2025-07-01 GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond Anna-Maria Halacheva et.al. 2507.00886 translate read null
2025-07-01 BEV-VAE: Multi-view Image Generation with Spatial Consistency for Autonomous Driving Zeming Chen et.al. 2507.00707 translate read null
2025-07-01 SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting Yiming Huang et.al. 2506.23309 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)