Scene Understanding - 2025-10

Publish Date Title Authors PDF Translate Read Code
2025-10-28 A Comprehensive Survey on Surgical Digital Twin Afsah Sharaf Khan et.al. 2512.00019 translate read null
2025-10-30 Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution Shiyao Sang et.al. 2511.05540 translate read null
2025-10-31 The Eigenvalues Entropy as a Classifier Evaluation Measure Doulaye Dembélé et.al. 2511.01904 translate read null
2025-10-30 AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency Piyushkumar Patel et.al. 2511.00107 translate read null
2025-10-31 Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs Sushil Samuel Dinesh et.al. 2510.27558 translate read null
2025-10-31 NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding Wei Xu et.al. 2510.27481 translate read null
2025-10-31 Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing Yijia Wang et.al. 2510.27335 translate read null
2025-10-31 Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis Weiming Chen et.al. 2510.27324 translate read null
2025-10-31 HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition Jiacheng Hong et.al. 2510.27148 translate read null
2025-10-30 A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics Simindokht Jahangard et.al. 2510.27033 translate read null
2025-10-30 The ANUBIS detector and its sensitivity to neutral long-lived particles ANUBIS Collaboration et.al. 2510.26932 translate read null
2025-10-30 HEIR: Learning Graph-Based Motion Hierarchies Cheng Zheng et.al. 2510.26786 translate read null
2025-10-30 Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios Manjunath Prasad Holenarasipura Rajiv et.al. 2510.26580 translate read null
2025-10-30 AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM Mirko Usuelli et.al. 2510.26358 translate read null
2025-10-30 GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-guided Latent Diffusion Model? Mingyu Sung et.al. 2510.26339 translate read null
2025-10-30 Letter of Intent: The Forward Physics Facility Luis A. Anchordoqui et.al. 2510.26260 translate read null
2025-10-30 Exploring Object-Aware Attention Guided Frame Association for RGB-D SLAM Ali Caglayan et.al. 2510.26131 translate read null
2025-10-29 Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Xu Zheng et.al. 2510.25760 translate read link
2025-10-29 More than a Moment: Towards Coherent Sequences of Audio Descriptions Eshika Khandelwal et.al. 2510.25440 translate read null
2025-10-29 U-CAN: Unsupervised Point Cloud Denoising with Consistency-Aware Noise2Noise Matching Junsheng Zhou et.al. 2510.25210 translate read null
2025-10-29 EA3D: Online Open-World 3D Object Extraction from Streaming Videos Xiaoyu Zhou et.al. 2510.25146 translate read null
2025-10-29 Learning Spatial-Aware Manipulation Ordering Yuxiang Yan et.al. 2510.25138 translate read null
2025-10-29 Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments Manjunath Prasad Holenarasipura Rajiv et.al. 2510.25070 translate read null
2025-10-28 VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos Qiucheng Wu et.al. 2510.24904 translate read null
2025-10-28 Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Inclusion AI et.al. 2510.24821 translate read link
2025-10-28 Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes Jonas Hein et.al. 2510.24332 translate read null
2025-10-28 Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning Aodi Wu et.al. 2510.24152 translate read null
2025-10-27 Optimized Loudspeaker Panning for Adaptive Sound-Field Correction and Non-stationary Listening Areas Yuancheng Luo et.al. 2510.23937 translate read null
2025-10-27 DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning Eddison Pham et.al. 2510.23907 translate read null
2025-10-27 Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Yujia Zhang et.al. 2510.23607 translate read null
2025-10-27 PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Yuqian Yuan et.al. 2510.23603 translate read link
2025-10-27 InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras Erich Liang et.al. 2510.23589 translate read null
2025-10-27 Localising under the drape: proprioception in the era of distributed surgical robotic system Martin Huber et.al. 2510.23512 translate read null
2025-10-27 UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception Karthikeyan Chandra Sekaran et.al. 2510.23478 translate read null
2025-10-27 Evaluation of Spherical Wavelet Framework in Comparsion with Ambisonics Ş. Ekmen et.al. 2510.23403 translate read null
2025-10-27 Evaluation of Vision-LLMs in Surveillance Video Pascal Benschop et.al. 2510.23190 translate read null
2025-10-27 Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI Aryan Mathur et.al. 2510.23148 translate read null
2025-10-27 SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency Quanjian Song et.al. 2510.22994 translate read null
2025-10-27 Charting the Design Space of Neural Graph Representations for Subgraph Matching Vaibhav Raj et.al. 2510.22897 translate read null
2025-10-26 IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction Hao Li et.al. 2510.22706 translate read link
2025-10-26 Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views Anna Deichler et.al. 2510.22672 translate read null
2025-10-25 BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles Seyed Ahmad Hosseini Miangoleh et.al. 2510.22370 translate read null
2025-10-25 Bridging Perception and Reasoning: Dual-Pipeline Neuro-Symbolic Landing for UAVs in Cluttered Environments Weixian Qian et.al. 2510.22204 translate read null
2025-10-25 MOGRAS: Human Motion with Grasping in 3D Scenes Kunal Bhosikar et.al. 2510.22199 translate read null
2025-10-25 LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction Yuhang Gao et.al. 2510.22141 translate read null
2025-10-25 CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding Lihuang Fang et.al. 2510.22119 translate read null
2025-10-07 Avi: Action from Volumetric Inference Harris Song et.al. 2510.21746 translate read null
2025-10-24 OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields Lisa Weijler et.al. 2510.21441 translate read null
2025-10-24 ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models Pranav Saxena et.al. 2510.21069 translate read null
2025-10-22 Uncertainty evaluation of segmentation models for Earth observation Melanie Rey et.al. 2510.19586 translate read null
2025-10-22 Exploring Scale Shift in Crowd Localization under the Context of Domain Generalization Juncheng Wang et.al. 2510.19330 translate read null
2025-10-21 Event-Grounding Graph: Unified Spatio-Temporal Scene Graph from Robotic Observations Phuoc Nguyen et.al. 2510.18697 translate read null
2025-10-21 MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning Wenhui Huang et.al. 2510.18337 translate read null
2025-10-21 UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding Da Zhang et.al. 2510.18262 translate read null
2025-10-21 OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion Tianyu Huang et.al. 2510.18253 translate read null
2025-10-20 Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models Katie Luo et.al. 2510.17274 translate read null
2025-10-19 SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes Xiongkun Linghu et.al. 2510.16714 translate read null
2025-10-18 Structured Interfaces for Automated Reasoning with 3D Scene Graphs Aaron Ray et.al. 2510.16643 translate read null
2025-10-11 ESCA: Contextualizing Embodied Agents via Scene-Graph Generation Jiani Huang et.al. 2510.15963 translate read null
2025-10-07 GAZE:Governance-Aware pre-annotation for Zero-shot World Model Environments Leela Krishna et.al. 2510.14992 translate read null
2025-10-16 QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps Matti Pekkanen et.al. 2510.14546 translate read null
2025-10-15 Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models Jia Yun Chua et.al. 2510.13993 translate read null
2025-10-15 SWIR-LightFusion: Multi-spectral Semantic Fusion of Synthetic SWIR with Thermal IR (LWIR/MWIR) and RGB Muhammad Ishfaq Hussain et.al. 2510.13404 translate read null
2025-10-15 FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding Francesco Barbato et.al. 2510.13243 translate read null
2025-10-14 VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages Jesse Atuhurra et.al. 2510.12845 translate read null
2025-10-14 SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding Zhiliu Yang et.al. 2510.12749 translate read null
2025-10-13 PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation Hatem Ibrahem et.al. 2510.11992 translate read null
2025-10-13 PhySIC: Physically Plausible 3D Human-Scene Interaction and Contact from a Single Image Pradyumna Yalandur Muralidhar et.al. 2510.11649 translate read null
2025-10-13 A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation Denis Zavadski et.al. 2510.11567 translate read null
2025-10-13 mmWalk: Towards Multi-modal Multi-view Walking Assistance Kedi Ying et.al. 2510.11520 translate read null
2025-10-13 REACT3D: Recovering Articulations for Interactive Physical 3D Scenes Zhao Huang et.al. 2510.11340 translate read null
2025-10-12 Real2USD: Scene Representations in Universal Scene Description Language Christopher D. Hsu et.al. 2510.10778 translate read null
2025-10-11 B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding Feng Xiao et.al. 2510.10194 translate read null
2025-10-10 CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation Kaiwen Wei et.al. 2510.09266 translate read null
2025-10-08 Out-of-Distribution Detection in LiDAR Semantic Segmentation Using Epistemic Uncertainty from Hierarchical GMMs Hanieh Shojaei Miandashti et.al. 2510.08631 translate read null
2025-10-03 Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes Nirmal Elamon et.al. 2510.08589 translate read null
2025-10-09 The impact of abstract and object tags on image privacy classification Darya Baranouskaya et.al. 2510.07976 translate read null
2025-10-09 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving Tianrui Zhang et.al. 2510.07944 translate read null
2025-10-09 An End-to-End Room Geometry Constrained Depth Estimation Framework for Indoor Panorama Images Kanglin Ning et.al. 2510.07817 translate read null
2025-10-07 Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model Danush Kumar Venkatesh et.al. 2510.07345 translate read null
2025-10-08 Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion Jie Luo et.al. 2510.06687 translate read null
2025-10-07 When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach Daniel Gonzálbez-Biosca et.al. 2510.05661 translate read null
2025-10-07 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video Hongchi Xia et.al. 2510.05560 translate read null
2025-10-06 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction Chi Yan et.al. 2510.04759 translate read null
2025-10-02 LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition Rixin Zhou et.al. 2510.01651 translate read null
2025-10-01 VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs Mohamad Al Mdfaa et.al. 2510.01483 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)