Scene Understanding - 2025-05

Publish Date Title Authors PDF Translate Read Code
2025-05-30 Tackling View-Dependent Semantics in 3D Language Gaussian Splatting Jiazhong Cen et.al. 2505.24746 translate read null
2025-05-30 Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Duo Zheng et.al. 2505.24625 translate read link
2025-05-30 EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding Ege Özsoy et.al. 2505.24287 translate read null
2025-05-29 ConversAR: Exploring Embodied LLM-Powered Group Conversations in Augmented Reality for Second Language Learners Jad Bendarkawi et.al. 2505.24000 translate read null
2025-05-29 A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation Shuzhou Sun et.al. 2505.23451 translate read null
2025-05-29 SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model Bowen Chen et.al. 2505.23010 translate read null
2025-05-28 On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation Liyao Tang et.al. 2505.22444 translate read null
2025-05-28 LiDAR Based Semantic Perception for Forklifts in Outdoor Environments Benjamin Serfling et.al. 2505.22258 translate read null
2025-05-28 3D Question Answering via only 2D Vision-Language Models Fengyun Wang et.al. 2505.22143 translate read null
2025-05-29 DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation Tianjun Gu et.al. 2505.21969 translate read null
2025-05-28 Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs Insu Lee et.al. 2505.21955 translate read null
2025-05-27 A Graph Completion Method that Jointly Predicts Geometry and Topology Enables Effective Molecule Assembly Rohan V. Koodli et.al. 2505.21833 translate read null
2025-05-29 Compositional Scene Understanding through Inverse Generative Modeling Yanbo Wang et.al. 2505.21780 translate read null
2025-05-30 Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks Keanu Nichols et.al. 2505.21649 translate read null
2025-05-27 Assured Autonomy with Neuro-Symbolic Perception R. Spencer Hallyburton et.al. 2505.21322 translate read null
2025-05-27 Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning Lintao Xu et.al. 2505.21231 translate read null
2025-05-27 Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts Yue Zhang et.al. 2505.21079 translate read null
2025-05-27 OccLE: Label-Efficient 3D Semantic Occupancy Prediction Naiyu Fang et.al. 2505.20617 translate read null
2025-05-27 OmniIndoor3D: Comprehensive Indoor 3D Reconstruction Xiaobao Wei et.al. 2505.20610 translate read null
2025-05-26 From Data to Modeling: Fully Open-vocabulary Scene Graph Generation Zuyao Chen et.al. 2505.20106 translate read null
2025-05-26 DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization Jianxin Huang et.al. 2505.20041 translate read null
2025-05-26 Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement Afrah Shaahid et.al. 2505.19895 translate read null
2025-05-26 LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study Dongil Yang et.al. 2505.19510 translate read link
2025-05-25 FHGS: Feature-Homogenized Gaussian Splatting Q. G. Duan et.al. 2505.19154 translate read null
2025-05-25 Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection Md. Mithun Hossain et.al. 2505.19010 translate read null
2025-05-24 Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding Guofeng Mei et.al. 2505.18819 translate read null
2025-05-24 Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Sicheng Feng et.al. 2505.18675 translate read link
2025-05-23 SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain Jiawei Zhou et.al. 2505.17727 translate read null
2025-05-23 From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation Mahmoud Chick Zaouali et.al. 2505.17402 translate read null
2025-05-22 Assessing the generalization performance of SAM for ureteroscopy scene understanding Martin Villagrana et.al. 2505.17210 translate read null
2025-05-22 CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation Haihong Hao et.al. 2505.16663 translate read link
2025-05-21 SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval Nikolaos Chaidos et.al. 2505.15867 translate read link
2025-05-21 HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning Xiaodong Mei et.al. 2505.15703 translate read null
2025-05-21 Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets Kaiyuan Chen et.al. 2505.15517 translate read link
2025-05-21 RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation Naman Patel et.al. 2505.15373 translate read null
2025-05-21 DC-Scene: Data-Centric Learning for 3D Scene Understanding Ting Huang et.al. 2505.15232 translate read link
2025-05-19 ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling Ege Özsoy et.al. 2505.12890 translate read null
2025-05-19 AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning Kai Zhang et.al. 2505.12782 translate read null
2025-05-19 Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps Ziqi Wen et.al. 2505.12660 translate read null
2025-05-18 LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding Hanyu Zhou et.al. 2505.12253 translate read null
2025-05-18 SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving Muleilan Pei et.al. 2505.12246 translate read null
2025-05-18 Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind Qingmei Li et.al. 2505.12207 translate read link
2025-05-18 Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding Xuefei Sun et.al. 2505.12194 translate read null
2025-05-17 TinyRS-R1: Compact Multimodal Language Model for Remote Sensing Aybora Koksal et.al. 2505.12099 translate read null
2025-05-15 StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation Daniel A. P. Oliveira et.al. 2505.10292 translate read link
2025-05-15 APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds Yuan Gao et.al. 2505.09971 translate read link
2025-05-14 DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection Jianlin Sun et.al. 2505.09168 translate read link
2025-05-14 Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning Dayong Liang et.al. 2505.09118 translate read null
2025-05-13 Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving Zongchuang Zhao et.al. 2505.08725 translate read link
2025-05-12 Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions Yi Zhang et.al. 2505.07611 translate read null
2025-05-11 Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding Chih-Chung Hsu et.al. 2505.06991 translate read null
2025-05-11 Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation Seokjun Kwon et.al. 2505.06951 translate read null
2025-05-09 Camera Control at the Edge with Language Models for Scene Understanding Alexiy Buynitsky et.al. 2505.06402 translate read null
2025-05-09 Camera-Only Bird’s Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles Anupkumar Bochare et.al. 2505.06113 translate read null
2025-05-08 Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization Sooyoung Park et.al. 2505.05343 translate read link
2025-05-08 PADriver: Towards Personalized Autonomous Driving Genghua Kou et.al. 2505.05240 translate read null
2025-05-08 Does CLIP perceive art the same way we do? Andrea Asperti et.al. 2505.05229 translate read null
2025-05-07 GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes Feng Xiao et.al. 2505.04659 translate read link
2025-05-07 RAFT: Robust Augmentation of FeaTures for Image Segmentation Edward Humes et.al. 2505.04529 translate read null
2025-05-03 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Gracjan Góral et.al. 2505.03821 translate read null
2025-05-06 MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation Mingcheng Li et.al. 2505.02648 translate read null
2025-05-04 Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation Volodymyr Havrylov et.al. 2505.02075 translate read link
2025-05-04 Segment Any RGB-Thermal Model with Language-aided Distillation Dong Xing et.al. 2505.01950 translate read null
2025-05-02 Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication Anurag Pallaprolu et.al. 2505.01625 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)