Scene Understanding - 2024-11
Scene Understanding - 2024-11
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-11-30 | Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding | Duo Zheng et.al. | 2412.00493 | translate | read | null |
| 2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | translate | read | null |
| 2024-11-29 | Quantifying the synthetic and real domain gap in aerial scene understanding | Alina Marcu et.al. | 2411.19913 | translate | read | null |
| 2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | Wenbo Zhang et.al. | 2411.19551 | translate | read | null |
| 2024-11-28 | GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks | Muhammad Sohail Danish et.al. | 2411.19325 | translate | read | link |
| 2024-11-28 | On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2411.19274 | translate | read | null |
| 2024-11-28 | InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception | Haijie Li et.al. | 2411.19235 | translate | read | null |
| 2024-11-27 | Reconstructing Animals and the Wild | Peter Kulits et.al. | 2411.18807 | translate | read | null |
| 2024-11-27 | Grid-augumented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents | Joongwon Chae et.al. | 2411.18270 | translate | read | null |
| 2024-11-27 | HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation | Trong-Thuan Nguyen et.al. | 2411.18042 | translate | read | null |
| 2024-11-26 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning | Hoàng-Ân Lê et.al. | 2411.17536 | translate | read | link |
| 2024-11-26 | HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2411.17530 | translate | read | null |
| 2024-11-25 | RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics | Chan Hee Song et.al. | 2411.16537 | translate | read | null |
| 2024-11-27 | An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models | Wentao Qu et.al. | 2411.16308 | translate | read | link |
| 2024-11-25 | Open-Vocabulary Octree-Graph for 3D Scene Understanding | Zhigang Wang et.al. | 2411.16253 | translate | read | null |
| 2024-11-24 | SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition | Yongkun Du et.al. | 2411.15858 | translate | read | link |
| 2024-11-24 | ROOT: VLM based System for Indoor Scene Understanding and Beyond | Yonghui Wang et.al. | 2411.15714 | translate | read | link |
| 2024-11-23 | Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification | Md Ashik Khan et.al. | 2411.15596 | translate | read | null |
| 2024-11-23 | Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing | Yadong Qu et.al. | 2411.15585 | translate | read | null |
| 2024-11-22 | UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations | Yuan Ren et.al. | 2411.15355 | translate | read | null |
| 2024-11-21 | Multimodal 3D Reasoning Segmentation with Complex Scenes | Xueying Jiang et.al. | 2411.13927 | translate | read | null |
| 2024-11-20 | Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs | Guanglu Sun et.al. | 2411.13287 | translate | read | null |
| 2024-11-20 | Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation | Rohith Peddi et.al. | 2411.13059 | translate | read | null |
| 2024-11-19 | GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Shaoqing Xu et.al. | 2411.12452 | translate | read | link |
| 2024-11-19 | Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning | Mustafa M. Abd Zaid et.al. | 2411.12415 | translate | read | null |
| 2024-11-18 | Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation | Hanieh Shojaei Miandashti et.al. | 2411.11935 | translate | read | null |
| 2024-11-18 | MGNiceNet: Unified Monocular Geometric Scene Understanding | Markus Schön et.al. | 2411.11466 | translate | read | null |
| 2024-11-18 | The ADUULM-360 Dataset – A Multi-Modal Dataset for Depth Estimation in Adverse Weather | Markus Schön et.al. | 2411.11455 | translate | read | null |
| 2024-11-18 | Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications | Scarlett Raine et.al. | 2411.11287 | translate | read | null |
| 2024-11-19 | Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition | Tiancheng Lin et.al. | 2411.11219 | translate | read | link |
| 2024-11-17 | Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry | Wenjun Hou et.al. | 2411.10937 | translate | read | null |
| 2024-11-16 | MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation | Ansh Shah et.al. | 2411.10886 | translate | read | link |
| 2024-11-16 | Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm | Sari Masri et.al. | 2411.10869 | translate | read | null |
| 2024-11-15 | TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding | Quang P. M. Pham et.al. | 2411.10509 | translate | read | null |
| 2024-11-15 | Content-Aware Preserving Image Generation | Giang H. Le et.al. | 2411.09871 | translate | read | null |
| 2024-11-13 | Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification | Jose-Luis Matez-Bandera et.al. | 2411.08727 | translate | read | link |
| 2024-11-11 | $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation | Yinshuang Xu et.al. | 2411.07326 | translate | read | null |
| 2024-11-06 | Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving | Depanshu Sani et.al. | 2411.03702 | translate | read | null |
| 2024-11-05 | VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation | Haochen Zhang et.al. | 2411.03540 | translate | read | link |
| 2024-11-05 | OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing | Pranav Gupta et.al. | 2411.02858 | translate | read | null |
| 2024-11-04 | Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting | Joey Wilson et.al. | 2411.02547 | translate | read | null |
| 2024-11-04 | Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images | Kun Huang et.al. | 2411.01749 | translate | read | link |
| 2024-11-03 | VQ-Map: Bird’s-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization | Yiwei Zhang et.al. | 2411.01618 | translate | read | link |
| 2024-11-01 | On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR | Li Li et.al. | 2411.00600 | translate | read | link |
| 2024-11-01 | Federated Voxel Scene Graph for Intracranial Hemorrhage | Antoine P. Sanner et.al. | 2411.00578 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)