Scene Understanding - 2024-11

Publish Date Title Authors PDF Translate Read Code
2024-11-30 Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Duo Zheng et.al. 2412.00493 translate read null
2024-11-29 SIMS: Simulating Human-Scene Interactions with Real World Script Planning Wenjia Wang et.al. 2411.19921 translate read null
2024-11-29 Quantifying the synthetic and real domain gap in aerial scene understanding Alina Marcu et.al. 2411.19913 translate read null
2024-11-29 Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Wenbo Zhang et.al. 2411.19551 translate read null
2024-11-28 GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Muhammad Sohail Danish et.al. 2411.19325 translate read link
2024-11-28 On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving Jon Gutiérrez-Zaballa et.al. 2411.19274 translate read null
2024-11-28 InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception Haijie Li et.al. 2411.19235 translate read null
2024-11-27 Reconstructing Animals and the Wild Peter Kulits et.al. 2411.18807 translate read null
2024-11-27 Grid-augumented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents Joongwon Chae et.al. 2411.18270 translate read null
2024-11-27 HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation Trong-Thuan Nguyen et.al. 2411.18042 translate read null
2024-11-26 Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning Hoàng-Ân Lê et.al. 2411.17536 translate read link
2024-11-26 HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving Jon Gutiérrez-Zaballa et.al. 2411.17530 translate read null
2024-11-25 RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Chan Hee Song et.al. 2411.16537 translate read null
2024-11-27 An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models Wentao Qu et.al. 2411.16308 translate read link
2024-11-25 Open-Vocabulary Octree-Graph for 3D Scene Understanding Zhigang Wang et.al. 2411.16253 translate read null
2024-11-24 SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition Yongkun Du et.al. 2411.15858 translate read link
2024-11-24 ROOT: VLM based System for Indoor Scene Understanding and Beyond Yonghui Wang et.al. 2411.15714 translate read link
2024-11-23 Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification Md Ashik Khan et.al. 2411.15596 translate read null
2024-11-23 Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing Yadong Qu et.al. 2411.15585 translate read null
2024-11-22 UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations Yuan Ren et.al. 2411.15355 translate read null
2024-11-21 Multimodal 3D Reasoning Segmentation with Complex Scenes Xueying Jiang et.al. 2411.13927 translate read null
2024-11-20 Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs Guanglu Sun et.al. 2411.13287 translate read null
2024-11-20 Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation Rohith Peddi et.al. 2411.13059 translate read null
2024-11-19 GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Shaoqing Xu et.al. 2411.12452 translate read link
2024-11-19 Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning Mustafa M. Abd Zaid et.al. 2411.12415 translate read null
2024-11-18 Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation Hanieh Shojaei Miandashti et.al. 2411.11935 translate read null
2024-11-18 MGNiceNet: Unified Monocular Geometric Scene Understanding Markus Schön et.al. 2411.11466 translate read null
2024-11-18 The ADUULM-360 Dataset – A Multi-Modal Dataset for Depth Estimation in Adverse Weather Markus Schön et.al. 2411.11455 translate read null
2024-11-18 Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications Scarlett Raine et.al. 2411.11287 translate read null
2024-11-19 Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition Tiancheng Lin et.al. 2411.11219 translate read link
2024-11-17 Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry Wenjun Hou et.al. 2411.10937 translate read null
2024-11-16 MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation Ansh Shah et.al. 2411.10886 translate read link
2024-11-16 Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm Sari Masri et.al. 2411.10869 translate read null
2024-11-15 TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding Quang P. M. Pham et.al. 2411.10509 translate read null
2024-11-15 Content-Aware Preserving Image Generation Giang H. Le et.al. 2411.09871 translate read null
2024-11-13 Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification Jose-Luis Matez-Bandera et.al. 2411.08727 translate read link
2024-11-11 $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation Yinshuang Xu et.al. 2411.07326 translate read null
2024-11-06 Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving Depanshu Sani et.al. 2411.03702 translate read null
2024-11-05 VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation Haochen Zhang et.al. 2411.03540 translate read link
2024-11-05 OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing Pranav Gupta et.al. 2411.02858 translate read null
2024-11-04 Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting Joey Wilson et.al. 2411.02547 translate read null
2024-11-04 Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images Kun Huang et.al. 2411.01749 translate read link
2024-11-03 VQ-Map: Bird’s-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization Yiwei Zhang et.al. 2411.01618 translate read link
2024-11-01 On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR Li Li et.al. 2411.00600 translate read link
2024-11-01 Federated Voxel Scene Graph for Intracranial Hemorrhage Antoine P. Sanner et.al. 2411.00578 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)