Scene Understanding - 2025-01 | Paper Arxiv Daily

Scene Understanding - 2025-01

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-01-30	Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation	Yuelei Li et.al.	2501.18733	translate	read	null
2025-01-30	Efficient Interactive 3D Multi-Object Removal	Jingcheng Ni et.al.	2501.17636	translate	read	null
2025-01-29	PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	Wei Chow et.al.	2501.16411	translate	read	link
2025-01-26	Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Song Chen et.al.	2501.15558	translate	read	link
2025-01-26	Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics	Ali Tourani et.al.	2501.15505	translate	read	link
2025-01-24	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	Xin Zhou et.al.	2501.14729	translate	read	link
2025-01-24	Scene Understanding Enabled Semantic Communication with Open Channel Coding	Zhe Xiang et.al.	2501.14520	translate	read	null
2025-01-23	GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization	Jaewon Lee et.al.	2501.13417	translate	read	null
2025-01-22	Neural Radiance Fields for the Real World: A Survey	Wenhui Xiao et.al.	2501.13104	translate	read	null
2025-01-22	PSGSL: A Probabilistic Framework Integrating Semantic Scene Understanding and Gas Sensing for Gas Source Localization	Pepe Ojeda et.al.	2501.12812	translate	read	null
2025-01-20	Dynamic Scene Understanding from Vision-Language Representations	Shahaf Pruss et.al.	2501.11653	translate	read	null
2025-01-20	EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Guankun Wang et.al.	2501.11347	translate	read	link
2025-01-20	A Survey of World Models for Autonomous Driving	Tuo Feng et.al.	2501.11260	translate	read	null
2025-01-17	A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features	Enes Karanfil et.al.	2501.10144	translate	read	null
2025-01-16	CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation	Alex Berian et.al.	2501.09838	translate	read	link
2025-01-16	YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks	Saptarashmi Bandyopadhyay et.al.	2501.09355	translate	read	null
2025-01-15	Embodied Scene Understanding for Vision Language Models via MetaVQA	Weizhen Wang et.al.	2501.09167	translate	read	null
2025-01-15	GOTLoc: General Outdoor Text-based Localization Using Scene Graph Retrieval with OpenStreetMap	Donghwi Jung et.al.	2501.08575	translate	read	link
2025-01-14	3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Haomiao Xiong et.al.	2501.07819	translate	read	link
2025-01-13	Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models	Yasiru Ranasinghe et.al.	2501.07396	translate	read	null
2025-01-13	Hierarchical Superpixel Segmentation via Structural Information Theory	Minhui Xie et.al.	2501.07069	translate	read	link
2025-01-12	Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving	Haoxiang Gao et.al.	2501.06680	translate	read	null
2025-01-08	NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data	Nirit Alkalay et.al.	2501.06235	translate	read	null
2025-01-10	Self-Supervised Partial Cycle-Consistency for Multi-View Matching	Fedor Taggenbrock et.al.	2501.06000	translate	read	link
2025-01-10	UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation	Xinyao Liao et.al.	2501.05687	translate	read	null
2025-01-09	Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding	Mohammed Elhenawy et.al.	2501.05566	translate	read	null
2025-01-09	A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision	Ali Rohan et.al.	2501.05147	translate	read	null
2025-01-08	TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning	Seungmin Baek et.al.	2501.04293	translate	read	null
2025-01-07	A Bayesian Modeling Framework for Estimation and Ground Segmentation of Cluttered Staircases	Prasanna Sriganesh et.al.	2501.04170	translate	read	null
2025-01-07	LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving	Lingdong Kong et.al.	2501.04005	translate	read	null
2025-01-07	CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds	Keonwoo Kim et.al.	2501.03879	translate	read	null
2025-01-07	Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets	Jing Liu et.al.	2501.03637	translate	read	null
2025-01-03	VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment	Wenyan Cong et.al.	2501.01949	translate	read	null
2025-01-03	IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks	Aecheon Jung et.al.	2501.01685	translate	read	link
2025-01-09	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Zhangyang Qi et.al.	2501.01428	translate	read	null
2025-01-02	3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer	Jiajun Deng et.al.	2501.01163	translate	read	null
2025-01-02	Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction	Xuan Yu et.al.	2501.01119	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)