Scene Understanding - 2024-11 | Paper Arxiv Daily

Scene Understanding - 2024-11

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-11-30	Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	Duo Zheng et.al.	2412.00493	translate	read	null
2024-11-29	SIMS: Simulating Human-Scene Interactions with Real World Script Planning	Wenjia Wang et.al.	2411.19921	translate	read	null
2024-11-29	Quantifying the synthetic and real domain gap in aerial scene understanding	Alina Marcu et.al.	2411.19913	translate	read	null
2024-11-29	Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding	Wenbo Zhang et.al.	2411.19551	translate	read	null
2024-11-28	GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks	Muhammad Sohail Danish et.al.	2411.19325	translate	read	link
2024-11-28	On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving	Jon Gutiérrez-Zaballa et.al.	2411.19274	translate	read	null
2024-11-28	InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception	Haijie Li et.al.	2411.19235	translate	read	null
2024-11-27	Reconstructing Animals and the Wild	Peter Kulits et.al.	2411.18807	translate	read	null
2024-11-27	Grid-augumented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Joongwon Chae et.al.	2411.18270	translate	read	null
2024-11-27	HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation	Trong-Thuan Nguyen et.al.	2411.18042	translate	read	null
2024-11-26	Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning	Hoàng-Ân Lê et.al.	2411.17536	translate	read	link
2024-11-26	HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving	Jon Gutiérrez-Zaballa et.al.	2411.17530	translate	read	null
2024-11-25	RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Chan Hee Song et.al.	2411.16537	translate	read	null
2024-11-27	An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models	Wentao Qu et.al.	2411.16308	translate	read	link
2024-11-25	Open-Vocabulary Octree-Graph for 3D Scene Understanding	Zhigang Wang et.al.	2411.16253	translate	read	null
2024-11-24	SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition	Yongkun Du et.al.	2411.15858	translate	read	link
2024-11-24	ROOT: VLM based System for Indoor Scene Understanding and Beyond	Yonghui Wang et.al.	2411.15714	translate	read	link
2024-11-23	Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification	Md Ashik Khan et.al.	2411.15596	translate	read	null
2024-11-23	Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing	Yadong Qu et.al.	2411.15585	translate	read	null
2024-11-22	UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations	Yuan Ren et.al.	2411.15355	translate	read	null
2024-11-21	Multimodal 3D Reasoning Segmentation with Complex Scenes	Xueying Jiang et.al.	2411.13927	translate	read	null
2024-11-20	Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs	Guanglu Sun et.al.	2411.13287	translate	read	null
2024-11-20	Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation	Rohith Peddi et.al.	2411.13059	translate	read	null
2024-11-19	GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving	Shaoqing Xu et.al.	2411.12452	translate	read	link
2024-11-19	Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning	Mustafa M. Abd Zaid et.al.	2411.12415	translate	read	null
2024-11-18	Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation	Hanieh Shojaei Miandashti et.al.	2411.11935	translate	read	null
2024-11-18	MGNiceNet: Unified Monocular Geometric Scene Understanding	Markus Schön et.al.	2411.11466	translate	read	null
2024-11-18	The ADUULM-360 Dataset – A Multi-Modal Dataset for Depth Estimation in Adverse Weather	Markus Schön et.al.	2411.11455	translate	read	null
2024-11-18	Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications	Scarlett Raine et.al.	2411.11287	translate	read	null
2024-11-19	Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition	Tiancheng Lin et.al.	2411.11219	translate	read	link
2024-11-17	Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry	Wenjun Hou et.al.	2411.10937	translate	read	null
2024-11-16	MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Ansh Shah et.al.	2411.10886	translate	read	link
2024-11-16	Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm	Sari Masri et.al.	2411.10869	translate	read	null
2024-11-15	TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding	Quang P. M. Pham et.al.	2411.10509	translate	read	null
2024-11-15	Content-Aware Preserving Image Generation	Giang H. Le et.al.	2411.09871	translate	read	null
2024-11-13	Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification	Jose-Luis Matez-Bandera et.al.	2411.08727	translate	read	link
2024-11-11	$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation	Yinshuang Xu et.al.	2411.07326	translate	read	null
2024-11-06	Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving	Depanshu Sani et.al.	2411.03702	translate	read	null
2024-11-05	VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation	Haochen Zhang et.al.	2411.03540	translate	read	link
2024-11-05	OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing	Pranav Gupta et.al.	2411.02858	translate	read	null
2024-11-04	Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting	Joey Wilson et.al.	2411.02547	translate	read	null
2024-11-04	Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Kun Huang et.al.	2411.01749	translate	read	link
2024-11-03	VQ-Map: Bird’s-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization	Yiwei Zhang et.al.	2411.01618	translate	read	link
2024-11-01	On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR	Li Li et.al.	2411.00600	translate	read	link
2024-11-01	Federated Voxel Scene Graph for Intracranial Hemorrhage	Antoine P. Sanner et.al.	2411.00578	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)