Scene Understanding - 2025-04 | Paper Arxiv Daily

Scene Understanding - 2025-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-04-30	V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving	Jannik Lübberstedt et.al.	2505.00156	translate	read	null
2025-04-30	LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics	Marc Glocker et.al.	2504.21716	translate	read	link
2025-04-30	ImaginateAR: AI-Assisted In-Situ Authoring in Augmented Reality	Jaewook Lee et.al.	2504.21360	translate	read	null
2025-04-28	Category-Level and Open-Set Object Pose Estimation for Robotics	Peter Hönig et.al.	2504.19572	translate	read	null
2025-04-28	Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding	Yan Wang et.al.	2504.19500	translate	read	null
2025-04-27	Beyond Physical Reach: Comparing Head- and Cane-Mounted Cameras for Last-Mile Navigation by Blind Users	Apurv Varshney et.al.	2504.19345	translate	read	null
2025-04-27	OpenFusion++: An Open-vocabulary Real-time Scene Understanding System	Xiaofeng Jin et.al.	2504.19266	translate	read	null
2025-04-27	CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis	Alexander Baumann et.al.	2504.19223	translate	read	null
2025-04-27	Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving	Mi Zheng et.al.	2504.19183	translate	read	null
2025-04-23	TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance	Meng Chu et.al.	2504.16505	translate	read	null
2025-04-21	Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends	Mohammad Abu Tami et.al.	2504.16134	translate	read	null
2025-04-22	Vision language models are unreliable at trivial spatial cognition	Sangeet Khemlani et.al.	2504.16061	translate	read	null
2025-04-20	Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension	Lin Li et.al.	2504.14642	translate	read	null
2025-04-20	RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots	Zhang Zhang et.al.	2504.14604	translate	read	null
2025-04-20	Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Tong Zeng et.al.	2504.14526	translate	read	link
2025-04-20	Vision-Centric Representation-Efficient Fine-Tuning for Robust Universal Foreground Segmentation	Guoyi Zhang et.al.	2504.14481	translate	read	null
2025-04-18	HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Alexander Rusnak et.al.	2504.13590	translate	read	null
2025-04-18	Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding	Yuchen Rao et.al.	2504.13580	translate	read	link
2025-04-18	Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation	Cheng Yuan et.al.	2504.13440	translate	read	null
2025-04-17	Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs	Shaohui Dai et.al.	2504.13153	translate	read	link
2025-04-17	Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks	Nassim Belmecheri et.al.	2504.12817	translate	read	null
2025-04-17	Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation	Changsheng Lv et.al.	2504.12606	translate	read	null
2025-04-16	Generalized Visual Relation Detection with Diffusion Models	Kaifeng Gao et.al.	2504.12100	translate	read	null
2025-04-17	DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency	Mengshi Qi et.al.	2504.12080	translate	read	link
2025-04-16	CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting	Wei Sun et.al.	2504.11893	translate	read	null
2025-04-15	Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning	Juan Garcia Giraldo et.al.	2504.11268	translate	read	null
2025-04-14	Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization	Darryl Hannan et.al.	2504.10727	translate	read	null
2025-04-14	SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding	Marc Gutiérrez-Pérez et.al.	2504.10106	translate	read	link
2025-04-12	Text To 3D Object Generation For Scalable Room Assembly	Sonia Laguna et.al.	2504.09328	translate	read	null
2025-04-11	FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment	Sebastián Barbas Laina et.al.	2504.08603	translate	read	null
2025-04-11	FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents	Xin Tan et.al.	2504.08581	translate	read	null
2025-04-11	DSM: Building A Diverse Semantic Map for 3D Visual Grounding	Qinghongbing Xie et.al.	2504.08307	translate	read	null
2025-04-10	SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos	Joshua Li et.al.	2504.07867	translate	read	null
2025-04-10	DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction	Xu Zhao et.al.	2504.07524	translate	read	null
2025-04-09	RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration	Omar Alama et.al.	2504.06994	translate	read	null
2025-04-09	Audio-visual Event Localization on Portrait Mode Short Videos	Wuyang Liu et.al.	2504.06884	translate	read	null
2025-04-09	MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Chang Nie et.al.	2504.06863	translate	read	null
2025-04-09	Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding	Pedro Hermosilla et.al.	2504.06719	translate	read	link
2025-04-09	Domain-Conditioned Scene Graphs for State-Grounded Task Planning	Jonas Herzog et.al.	2504.06661	translate	read	null
2025-04-09	Attributes-aware Visual Emotion Representation Learning	Rahul Singh Maharjan et.al.	2504.06578	translate	read	null
2025-04-08	CamContextI2V: Context-aware Controllable Video Generation	Luis Denninger et.al.	2504.06022	translate	read	link
2025-04-08	AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems	Zhuoli Zhuang et.al.	2504.05950	translate	read	null
2025-04-08	PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario	Sriram Mandalika et.al.	2504.05908	translate	read	null
2025-04-08	InvNeRF-Seg: Fine-Tuning a Pre-Trained NeRF for 3D Object Segmentation	Jiangsan Zhao et.al.	2504.05751	translate	read	null
2025-04-07	RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model	Congcong Wen et.al.	2504.04988	translate	read	null
2025-04-07	Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding	Zahir Alsulaimawi et.al.	2504.04772	translate	read	null
2025-04-07	DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation	Bo-Wen Yin et.al.	2504.04701	translate	read	link
2025-04-06	Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models	Rui Gan et.al.	2504.04562	translate	read	null
2025-04-04	3D Scene Understanding Through Local Random Access Sequence Modeling	Wanhee Lee et.al.	2504.03875	translate	read	link
2025-04-07	NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving	Kexin Tian et.al.	2504.03164	translate	read	null
2025-04-03	F-ViTA: Foundation Model Guided Visible to Thermal Translation	Jay N. Paranjape et.al.	2504.02801	translate	read	link
2025-04-03	Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision	Xiaofeng Han et.al.	2504.02477	translate	read	link
2025-04-02	Scene-Centric Unsupervised Panoptic Segmentation	Oliver Hahn et.al.	2504.01955	translate	read	link
2025-04-02	Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	Haochen Wang et.al.	2504.01901	translate	read	null
2025-04-02	CoMatcher: Multi-View Collaborative Feature Matching	Jintao Zhang et.al.	2504.01872	translate	read	null
2025-04-02	TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication	Petr Vanc et.al.	2504.01708	translate	read	null
2025-04-02	Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation	Junjie Chen et.al.	2504.01668	translate	read	null
2025-04-01	WikiVideo: Article Generation from Multiple Videos	Alexander Martin et.al.	2504.00939	translate	read	link
2025-04-01	Zero-Shot 4D Lidar Panoptic Segmentation	Yushan Zhang et.al.	2504.00848	translate	read	null
2025-04-01	PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks	Abdelrahman Elskhawy et.al.	2504.00844	translate	read	null
2025-04-01	Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights	Yuchen Liu et.al.	2504.00839	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)