Scene Understanding - 2025-02 | Paper Arxiv Daily

Scene Understanding - 2025-02

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-02-28	Vibrotactile information coding strategies for a body-worn vest to aid robot-human collaboration	Adrian Vecina Tercero et.al.	2502.21056	translate	read	null
2025-02-27	Towards Statistical Factuality Guarantee for Large Vision-Language Models	Zhuohang Li et.al.	2502.20560	translate	read	null
2025-02-26	Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator	Xiankang He et.al.	2502.19204	translate	read	link
2025-02-25	VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion	Pei Liu et.al.	2502.18042	translate	read	null
2025-02-24	AAD-LLM: Neural Attention-Driven Auditory Scene Understanding	Xilin Jiang et.al.	2502.16794	translate	read	link
2025-02-28	Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model	Yaxuan Huang et.al.	2502.16779	translate	read	link
2025-02-23	Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration	Kim Jun-Seong et.al.	2502.16652	translate	read	null
2025-02-21	Weakly Supervised Video Scene Graph Generation via Natural Language Supervision	Kibum Kim et.al.	2502.15370	translate	read	link
2025-02-21	DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation	Luzhou Ge et.al.	2502.15309	translate	read	link
2025-02-21	Hierarchical Context Transformer for Multi-level Semantic Scene Understanding	Luoying Hao et.al.	2502.15184	translate	read	link
2025-02-20	CrossOver: 3D Scene Cross-Modal Alignment	Sayan Deb Sarkar et.al.	2502.15011	translate	read	link
2025-02-20	Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting	Boying Li et.al.	2502.14931	translate	read	null
2025-02-19	Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning	Rui Zhao et.al.	2502.14917	translate	read	null
2025-02-16	Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review	Ufaq Khan et.al.	2502.14886	translate	read	null
2025-02-21	AVD2: Accident Video Diffusion for Accident Video Description	Cheng Li et.al.	2502.14801	translate	read	null
2025-02-18	Spiking Vision Transformer with Saccadic Attention	Shuai Wang et.al.	2502.12677	translate	read	null
2025-02-16	NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM	Zihan Wang et.al.	2502.11142	translate	read	link
2025-02-15	Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy	Mingyang Zhao et.al.	2502.10704	translate	read	link
2025-02-14	Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation	Gamal Elghazaly et.al.	2502.10127	translate	read	null
2025-02-13	FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation	Bin Yang et.al.	2502.09274	translate	read	null
2025-02-13	Billet Number Recognition Based on Test-Time Adaptation	Yuan Wei et.al.	2502.09026	translate	read	null
2025-02-13	EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition	Xiao Wang et.al.	2502.09020	translate	read	link
2025-02-13	3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning	Guoqin Tang et.al.	2502.08903	translate	read	null
2025-02-10	Fully Exploiting Vision Foundation Model’s Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing	Sicen Guo et.al.	2502.06219	translate	read	null
2025-02-08	Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model	Mohammad Kianpisheh et.al.	2502.05457	translate	read	null
2025-02-06	sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views	Eyvaz Najafli et.al.	2502.04318	translate	read	null
2025-02-06	Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation	Lin Li et.al.	2502.03856	translate	read	null
2025-02-05	EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality	Junlong Chen et.al.	2502.03564	translate	read	null
2025-02-04	Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation	Junha Lee et.al.	2502.02548	translate	read	null
2025-02-04	Event-aided Semantic Scene Completion	Shangwei Guo et.al.	2502.02334	translate	read	link
2025-02-03	AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis	Basit Alawode et.al.	2502.01785	translate	read	null
2025-02-04	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Akash Kumar et.al.	2501.17053	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)